@ainyc/canonry 4.57.0 → 4.59.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (26) hide show
  1. package/assets/agent-workspace/skills/canonry/SKILL.md +7 -0
  2. package/assets/agent-workspace/skills/canonry/references/canonry-cli.md +44 -0
  3. package/assets/agent-workspace/skills/canonry/references/google-business-profile.md +219 -0
  4. package/assets/assets/{BacklinksPage-CmeFZ8UJ.js → BacklinksPage-D_mc7c-b.js} +1 -1
  5. package/assets/assets/{ChartPrimitives-D7C1Cp8w.js → ChartPrimitives-BViWneKX.js} +1 -1
  6. package/assets/assets/{ProjectPage-Y6uCyjGb.js → ProjectPage-_hpYJAN1.js} +1 -1
  7. package/assets/assets/{RunRow-BntNdrgM.js → RunRow-DK69_0iD.js} +1 -1
  8. package/assets/assets/{RunsPage-Btp6qn10.js → RunsPage-DRu1peAA.js} +1 -1
  9. package/assets/assets/{SettingsPage-DkyNiU2i.js → SettingsPage-BrednApH.js} +1 -1
  10. package/assets/assets/{TrafficPage-CBl4Mwdc.js → TrafficPage-oFA65ZZc.js} +1 -1
  11. package/assets/assets/{TrafficSourceDetailPage-BZzuWCn-.js → TrafficSourceDetailPage-CUzzaYFC.js} +1 -1
  12. package/assets/assets/{extract-error-message-De8_qAzs.js → extract-error-message-Cv4MXGtB.js} +1 -1
  13. package/assets/assets/{index-XUKhruAg.js → index-BrCh3uvb.js} +90 -90
  14. package/assets/assets/{server-traffic-bn9LSZN9.js → server-traffic-rYE-NlE-.js} +1 -1
  15. package/assets/assets/{trash-2-B5clF2rU.js → trash-2-BgGGPjQf.js} +1 -1
  16. package/assets/index.html +1 -1
  17. package/dist/{chunk-4KWPOVIT.js → chunk-JW6TQFU7.js} +171 -1
  18. package/dist/{chunk-WFVUZVJD.js → chunk-LPPW7O26.js} +1421 -1190
  19. package/dist/{chunk-HL6JZUEW.js → chunk-NOQ4ZE3E.js} +2158 -624
  20. package/dist/{chunk-6X5TF73A.js → chunk-TFBPLY77.js} +414 -1
  21. package/dist/cli.js +610 -458
  22. package/dist/index.d.ts +2 -1
  23. package/dist/index.js +4 -4
  24. package/dist/{intelligence-service-NY3MAVPB.js → intelligence-service-V4SWVKEQ.js} +2 -2
  25. package/dist/mcp.js +10 -8
  26. package/package.json +8 -7
@@ -109,6 +109,12 @@ When the project ships behind a server you control, wire crawler + AI-referral e
109
109
 
110
110
  **Vercel gotcha:** a freshly connected Vercel source captures only going-forward traffic — `lastSyncedAt` is seeded to NOW to avoid the 30-day default window exceeding Vercel's ~14-day request-logs retention (which would otherwise throw on every first sync). Use `cnry traffic backfill <project> --source <id> --days N` for historical recovery. If an idle Vercel/Cloud Run source has been failing long enough that `lastSyncedAt` aged past retention, unstick it with `cnry traffic reset <project> --source <id> --advance-to-now`.
111
111
 
112
+ ## Local AEO (Google Business Profile)
113
+
114
+ For businesses with a physical location or service area, Google Business Profile is the local-AEO signal source — reviews, search-keyword impressions, daily performance metrics, and (for hotels) structured amenities + booking CTAs all feed how AI engines answer local-intent queries. Connect with `cnry gbp connect <project>`, discover locations with `cnry gbp locations discover <project>`, and pick which sync with `cnry gbp locations select/deselect`.
115
+
116
+ **Hard prerequisites and gotchas — read `references/google-business-profile.md` before attempting setup:** GBP requires a Google access-form approval (0 QPM until granted), the only OAuth scope is the write-capable `business.manage`, **reviews live on a separately-gated legacy v4 API that the Basic approval does NOT grant** (and can't be self-enabled), and the **Q&A API is permanently shut down (HTTP 501)**. Keyword data is heavily privacy-redacted (often 100% for small businesses), and empty lodging/place-action profiles are themselves the AEO finding to surface. The reference doc has the full setup walkthrough, the real-world data shapes, and the troubleshooting matrix.
117
+
112
118
  ## Built-in Analyst (Aero)
113
119
 
114
120
  Canonry ships a built-in agent — Aero — for users who don't already have one. Drive it from the CLI:
@@ -138,6 +144,7 @@ Aero also wakes unprompted after every `run.completed` so insights and regressio
138
144
  | `references/indexing.md` | Submitting URLs, checking GSC/Bing coverage, fixing indexing gaps |
139
145
  | `references/wordpress-integration.md` | Connecting to WordPress, editing pages, pushing staging → live |
140
146
  | `references/server-side-traffic.md` | Wiring server-log evidence (Cloud Run, WordPress, Vercel adapters) for AI Visibility — Server-Side. Connect, sync, manage sources, troubleshoot. |
147
+ | `references/google-business-profile.md` | Connecting Google Business Profile for local AEO: access-form approval, GCP API enablement, the v4-reviews access gate, hotel lodging/place-action signals, data shapes, troubleshooting. |
141
148
 
142
149
  ---
143
150
 
@@ -363,6 +363,50 @@ cnry ga coverage <project> # per-page overlay: {landingPage,
363
363
 
364
364
  Every read command queries persisted DB rows, so a stale `lastSyncedAt` means the response is stale — always check `ga status` before drawing conclusions, and re-`ga sync` if the data is older than the analysis window. Use `--only ai` or `--only social` to refresh just one slice when iterating.
365
365
 
366
+ ## Google Business Profile (Local AEO)
367
+
368
+ GBP integration tracks how AI engines see a business's local presence — search-keyword impressions, daily performance metrics, hotel lodging attributes, and booking CTAs. It reuses the **Google OAuth client** (same `google.clientId`/`clientSecret` as GSC; the connection is stored under the `gbp` connection type). **Hard prerequisite:** the Google Cloud project must be approved through Google's Business Profile API Basic Access form, or every call returns HTTP 403 at 0 QPM. See `references/google-business-profile.md` for the full GCP-setup + access-request playbook, the reviews/Q&A gating, and real-world data-shape quirks.
369
+
370
+ Like GA4, `gbp sync` writes to local DB tables and every read command queries the local store — reads are fast and quotaless; a stale sync means stale reads. All commands support `--format json`.
371
+
372
+ ```bash
373
+ cnry gbp connect <project> [--public-url <url>] # OAuth connect (reuses the Google client)
374
+ cnry gbp disconnect <project> # remove the GBP connection + ALL synced GBP data
375
+ cnry gbp accounts <project> # list GBP accounts this connection can access
376
+ # (account selection is per project — pick one below)
377
+ cnry gbp locations discover <project> [--account accounts/{n}] [--switch-account] [--no-select-new]
378
+ # discover a chosen account's locations; --account targets a
379
+ # specific account (omit = the account the project already tracks,
380
+ # else the first visible one); --switch-account opts into the
381
+ # destructive re-point to a different account; selects all new by default
382
+ cnry gbp locations <project> [--selected-only] # list discovered locations + selection state
383
+ cnry gbp locations select <project> --location locations/{n}
384
+ cnry gbp locations deselect <project> --location locations/{n}
385
+ # only SELECTED locations are synced
386
+ cnry gbp sync <project> [--location locations/{n}] [--days N] [--months N] [--wait]
387
+ # fires the gbp-sync run: daily metrics + keyword impressions
388
+ # + place-action links + lodging snapshot per selected location;
389
+ # --wait polls to a terminal run status
390
+ cnry gbp metrics <project> [--location locations/{n}] [--metric <DailyMetric>]
391
+ # stored daily metrics + totals-by-metric
392
+ cnry gbp keywords <project> [--location locations/{n}]
393
+ # stored search-keyword impressions over the synced
394
+ # periodStart..periodEnd window; renders exact counts and
395
+ # <N thresholded floors + a thresholdedPct fidelity stat
396
+ cnry gbp place-actions <project> [--location locations/{n}]
397
+ # booking / reservation / order CTAs per location, with
398
+ # placeActionType, providerType (MERCHANT vs AGGREGATOR), isPreferred, uri
399
+ cnry gbp lodging <project> [--location locations/{n}]
400
+ # latest hotel-attribute snapshot per location (snapshot-on-change):
401
+ # populatedGroupCount + syncedAt; empty profiles are an AEO gap, not an error
402
+ cnry gbp summary <project> [--location locations/{n}]
403
+ # composite scorecard: performance totals + recent-vs-prior 7d
404
+ # deltas (deltaPct null when prior=0), keyword coverage,
405
+ # place-action CTA presence flags, lodging completeness counts
406
+ ```
407
+
408
+ `gbp sync` produces a run with the standard statuses (`completed` / `partial` / `failed`); `partial` means some selected locations synced and others errored (the per-location errors are on the run). Non-lodging locations are skipped cleanly (Google answers the lodging call with HTTP 400, not 404). Reviews are **not** synced — the v4 Reviews API is producer-restricted by Google and unavailable on most projects; the Q&A API was retired (HTTP 501).
409
+
366
410
  ## Backlinks (Common Crawl)
367
411
 
368
412
  Workspace-level Common Crawl release sync + per-project backlink extraction. Requires DuckDB; install once with `cnry backlinks install`. Releases are downloaded once per workspace and reused across all projects.
@@ -0,0 +1,219 @@
1
+ # Google Business Profile Integration
2
+
3
+ Canonry integrates with the Google Business Profile (GBP) API to surface local AEO signals: search-keyword impressions, daily performance metrics, hotel lodging attributes, and booking/reservation CTAs (plus reviews on the projects where Google has granted v4 access — see the gating section). This data feeds the local-AEO dashboard and the Aero analyst.
4
+
5
+ > **Q&A is not available.** Google shut down the My Business Q&A API — it returns HTTP 501 `API_UNSUPPORTED`. There is no programmatic way to read or write profile Q&A. Don't plan around it.
6
+
7
+ ## What Canonry Automates
8
+
9
+ - Discover GBP accounts and locations the connected user manages, with explicit per-location selection
10
+ - Sync search-keyword impressions aggregated over a date window (default ~12 months; stored with `periodStart`/`periodEnd`)
11
+ - Sync daily performance metrics — impressions, website clicks, call clicks, direction requests (all 11 `DailyMetric`s)
12
+ - For hotels: sync lodging attributes (amenities, accessibility, pets, etc.) and place action links (booking CTAs)
13
+ - Roll the above into a composite summary scorecard (`canonry gbp summary`)
14
+ - Sync reviews per location — **only where Google has granted v4 access** (gated; unavailable on most projects — see below)
15
+
16
+ ## What Stays Manual
17
+
18
+ - Replying to reviews (Phase 4 — write surface)
19
+ - Posting local posts / offers (Phase 4)
20
+ - Updating lodging attributes (Phase 4)
21
+ - Pub/Sub notifications setup (Phase 4)
22
+
23
+ For now, canonry is read-only on GBP. The Aero agent can draft suggested replies, but applying them is manual.
24
+
25
+ ## Hard Prerequisite: API Access Approval
26
+
27
+ **Before anything works**, your Google Cloud project must be approved through Google's Business Profile API Basic Access form. Until approved, every API call returns HTTP 403 / 0 QPM — regardless of OAuth scope or which APIs you've enabled.
28
+
29
+ ### Eligibility requirements
30
+
31
+ From [Google's prerequisites doc](https://developers.google.com/my-business/content/prereqs#request-access):
32
+
33
+ - **Active Verified Profile** — "Manage a Google Business Profile that is verified and active for 60+ days."
34
+ - **Website Requirement** — "Have a website representing the business listed on the GBP."
35
+ - **Profile Completeness** — Google recommends the profile be "fully complete and kept up-to-date with the current business information."
36
+
37
+ Brand-new profiles (under 60 days) and profiles with no associated website are not eligible.
38
+
39
+ ### Submitting the access request
40
+
41
+ 1. Go to the GBP API contact form: <https://support.google.com/business/contact/api_default>
42
+ 2. Select **"Application for Basic API Access"** from the dropdown.
43
+ 3. Provide:
44
+ - Your **Google Cloud Console project number** (Cloud Console → Project Info dashboard, *not* the project ID)
45
+ - Email address listed as an **owner or manager** on the target GBP profile
46
+
47
+ ### Approval timeline
48
+
49
+ - Google sends a follow-up email after review (timing varies; common reports are days to a few weeks).
50
+ - Approval is signaled by a **quota change** in Google Cloud Console:
51
+ - **Not approved**: quota is **0 QPM** (Queries Per Minute) — every API call returns 403.
52
+ - **Approved**: quota is **300 QPM** — all enabled APIs become callable.
53
+
54
+ Check quota at Cloud Console → APIs & Services → quotas, filtered by one of the GBP APIs.
55
+
56
+ ## GCP Setup
57
+
58
+ ### Enable the right APIs
59
+
60
+ In Cloud Console → APIs & Services → Library, enable:
61
+
62
+ | API | Purpose |
63
+ |---|---|
64
+ | My Business Account Management API | List accounts |
65
+ | My Business Business Information API | List locations + place action links |
66
+ | Business Profile Performance API | Daily metrics + monthly search keywords |
67
+ | My Business Verifications API | (optional) Voice-of-Merchant state |
68
+ | **My Business Lodging API** | **Hotel attributes — required if working with lodging properties** |
69
+ | **My Business Place Actions API** | **Booking / reservation CTAs — required if hotels or restaurants use them** |
70
+
71
+ **Do NOT enable "My Business Q&A API"** — Google shut it down (HTTP 501 `API_UNSUPPORTED`). It's listed in some older setup docs but no longer functions.
72
+
73
+ ### The legacy "Google My Business API" (v4 — reviews)
74
+
75
+ The **reviews** endpoint lives on the legacy `mybusiness.googleapis.com` (v4), a separate API from the v1 family above. It is the single biggest stumbling block, and **production testing (May 2026) proved the Basic API Access approval does NOT grant it.**
76
+
77
+ What we confirmed, with a project approved and running the v1 family at 300 QPM, authenticated as the exact approved account with the `business.manage` scope:
78
+
79
+ - The v4 reviews call still returns `403 SERVICE_DISABLED`.
80
+ - The API is **not searchable in the API Library** — the library page returns "Failed to load."
81
+ - `gcloud services enable mybusiness.googleapis.com` returns `PERMISSION_DENIED` reason `110002` (`AUTH_PERMISSION_DENIED`) **even as the approved account** — this is the signature of a producer-restricted (Google-allowlisted) service that the project owner cannot toggle.
82
+ - Per Google's [Basic setup](https://developers.google.com/my-business/content/basic-setup) doc: *"The Google My Business API is only visible in the Google API Console to users who submit and receive approval for their Google Account through the access request form."*
83
+
84
+ **Conclusion:** the v4 GMB API is gated independently of the v1 approval and Google controls the switch. The only routes are (1) the **shortcut "enable" link from the access-approval email**, opened as the approved account in the browser, or (2) replying to the access-request thread asking Google to enable `mybusiness.googleapis.com` for your project number. Self-service (library, gcloud) does not work. Build reviews behind this gate and ship the rest without it.
85
+
86
+ **Account-credential gotcha:** API calls use your **Application Default Credentials** (`gcloud auth application-default login` → `print-access-token`), while `gcloud services enable` uses the separate **gcloud CLI account** (`gcloud config get-value account`). These can be different identities. Verify the token's real account with `curl "https://www.googleapis.com/oauth2/v1/tokeninfo?access_token=$TOKEN"` before concluding anything about access — a "wrong account" symptom is often just the two credential stores disagreeing.
87
+
88
+ ### Create OAuth client credentials
89
+
90
+ 1. Cloud Console → APIs & Services → **OAuth consent screen** → set up consent (External works). Add your own email under "Test users" while the app is in test mode.
91
+ 2. Cloud Console → APIs & Services → **Credentials** → **+ Create Credentials** → **OAuth client ID**.
92
+ 3. Application type: **Web application**.
93
+ 4. Authorized redirect URIs: `http://localhost:53682/callback` (or whatever port canonry's connect flow uses — match it exactly).
94
+ 5. Save the **Client ID** and **Client secret**.
95
+
96
+ ### Store credentials for canonry
97
+
98
+ Either set env vars before running CLI commands:
99
+
100
+ ```bash
101
+ export GOOGLE_CLIENT_ID="…"
102
+ export GOOGLE_CLIENT_SECRET="…"
103
+ ```
104
+
105
+ Or persist them in `~/.canonry/config.yaml`:
106
+
107
+ ```yaml
108
+ google:
109
+ clientId: "your-client-id"
110
+ clientSecret: "your-client-secret"
111
+ ```
112
+
113
+ OAuth tokens (per-user, obtained at connect time) are stored in the same file under `google.connections`. They are never written to the canonry database.
114
+
115
+ ## Connect a Project
116
+
117
+ Once GCP setup is done and the access form is approved:
118
+
119
+ ```bash
120
+ canonry gbp connect <project>
121
+ canonry gbp accounts <project> # list accessible accounts (pick one)
122
+ canonry gbp locations discover <project> --account accounts/123 # discover that account's locations
123
+ canonry gbp locations <project> # verify discovered locations
124
+ canonry gbp sync <project> --wait # run a first sync
125
+ canonry gbp summary <project> # check derived metrics
126
+ ```
127
+
128
+ The OAuth scope requested is `https://www.googleapis.com/auth/business.manage`. **There is no read-only variant** — Google does not publish one. The consent screen will say "manage your business profile" even though canonry's read-only surface cannot write anything until Phase 4.
129
+
130
+ ### Account selection is per project
131
+
132
+ A single OAuth user often manages **multiple GBP accounts** (a personal account, a location group, agency-managed businesses). Each canonry project tracks **one** account's locations — so to track two businesses, use two projects.
133
+
134
+ - **List accounts:** `canonry gbp accounts <project>` (API `GET /gbp/accounts`, MCP `canonry_gbp_accounts`) shows every account the connection can see, with its `accounts/{n}` resource name.
135
+ - **Pick one at discover time:** `canonry gbp locations discover <project> --account accounts/{n}`. Omitting `--account` reuses the account the project already tracks; on the very first discover with no `--account`, canonry falls back to the **first** account the user can see — so if you manage more than one account, always pass `--account` the first time to avoid silently tracking the wrong business.
136
+ - **Switching accounts is destructive:** re-pointing a project at a *different* account would drop the old account's locations and all its synced data, so it's rejected unless you pass `--switch-account` (API `switchAccount: true`). You can also `canonry gbp disconnect <project>` (which now clears the project's entire GBP footprint) and start fresh.
137
+
138
+ ## The Summary Scorecard (`canonry gbp summary`)
139
+
140
+ `canonry gbp summary <project> [--location locations/XXX]` (API: `GET /gbp/summary`, MCP: `canonry_gbp_summary`) is the single composite read that rolls every synced GBP surface into one scorecard. All math lives in the API (`buildGbpSummary`) — the CLI and dashboard only render it, so `--format json` matches the API response field-for-field. Fields:
141
+
142
+ - **`scope`** — `{ locationName, locationCount }`. `locationName` is null when summarizing across all selected locations.
143
+ - **`performance`** — daily-metric roll-up:
144
+ - `totals` — sum per `DailyMetric` over everything synced.
145
+ - `recent7d` / `prior7d` — per-metric sums for the last 7 days vs the 7 days before, anchored to the **most recent stored metric date** (not wall-clock — GBP data lags ~2–3 days, so anchoring to "today" would always show empty recent windows). Both maps are backfilled with the union of metrics as explicit `0`s so a metric present in only one window still appears in both.
146
+ - `deltaPct` — percent change recent-vs-prior per metric; **`null` when the prior window is `0`** (no divide-by-zero, and "appeared from nothing" is not a percentage).
147
+ - **`keywords`** — `{ total, thresholdedCount, thresholdedPct }`. `thresholdedPct` (0–100) is the share of keywords whose exact count Google redacted — your headline data-fidelity number (expect ~89% for a busy hotel, 100% for an SMB).
148
+ - **`placeActions`** — `{ total, hasReservationCta, hasBookingCta, hasDirectMerchantCta }`. `hasDirectMerchantCta` is false when the only booking links are OTA/aggregator (Expedia/Booking) — a recommendation to add a direct CTA.
149
+ - **`lodging`** — `{ lodgingLocationCount, populatedLodgingCount, emptyLodgingCount }`. `emptyLodgingCount` counts lodging-capable locations with zero structured attributes — the AEO gap to surface.
150
+
151
+ ## Hotel-Specific Setup
152
+
153
+ For hotel groups, two extra signal sources are critical:
154
+
155
+ ### Lodging attributes (`canonry gbp lodging`)
156
+
157
+ AI engines (Gemini, Perplexity, ChatGPT) pull hotel attributes verbatim from the Lodging resource to answer queries like "does X hotel have a pool?" or "is Y pet-friendly?". Coverage of these structured fields is the **highest-signal AEO surface for hotels** — higher than reviews or website content.
158
+
159
+ Canonry computes a coverage score per property and flags missing high-signal attributes (pool, free wifi, pets, parking, breakfast, fitness, spa, accessibility) in the summary endpoint.
160
+
161
+ A non-lodging location returns HTTP 400 `FAILED_PRECONDITION` ("This operation is not supported for this location. Please check the value of `Location.location_state.can_operate_lodging_data`") — not a 404. Fix the primary category in the Business Profile UI to a lodging category first. A lodging-category location with no amenities filled in returns HTTP 200 with only `{ "name": "..." }` — that empty profile is itself a finding (AI engines have no structured amenity data to cite).
162
+
163
+ ### Place action links (`canonry gbp place-actions`)
164
+
165
+ Booking and reservation CTAs surfaced in AI answers come from `placeActionLinks`, not from the website URL. Canonry tracks:
166
+
167
+ - `placeActionType` (`RESERVATION`, `BOOK`, `ORDER_FOOD`, …)
168
+ - `providerType` (`MERCHANT` for direct, `AGGREGATOR` for OTA links like Expedia/Booking)
169
+ - `isPreferred` flag
170
+ - `uri`
171
+
172
+ A property with only aggregator booking links and no direct merchant CTA is a recommendation to surface.
173
+
174
+ ## Important Constraints
175
+
176
+ - **No read-only OAuth scope** — `business.manage` is the only published scope. The consent screen will warn about write access even though canonry's v1 is read-only.
177
+ - **300 QPM shared quota** — across all GBP sub-APIs on one Google Cloud project. Canonry's sync worker caps per-location concurrency at 4 (~28 in-flight calls at peak) to stay well under the cap.
178
+ - **10 edits/min per profile** — hard cap on writes (relevant for Phase 4). Cannot be raised.
179
+ - **Privacy-redacted keyword impressions** — for each keyword aggregated over the requested window, Google returns either an exact `value` or only a `threshold` floor (`<N`). Canonry stores both shapes (`valueCount` / `valueThreshold`) against the row's `periodStart`/`periodEnd` and surfaces a "% thresholded" stat so the user understands data fidelity. Note the Performance API aggregates each keyword over the **whole** requested date range — it does not break impressions down per calendar month.
180
+ - **Hybrid v1/v4 surface, separately gated** — reviews live on the legacy v4 host (`mybusiness.googleapis.com/v4`); everything else is v1. **The Basic API Access approval grants the v1 family but NOT v4.** Confirmed in production: a project running v1 at 300 QPM still gets `403 SERVICE_DISABLED` on v4 reviews, and the v4 API is producer-restricted (`gcloud services enable` → `PERMISSION_DENIED 110002`) so it can't be self-enabled even by the approved account. Treat reviews as a separately-gated surface that may be unavailable; never block the rest of the integration on it.
181
+ - **Multi-location chains** — a 200-location chain hits ~600+ API calls per sync. Default sync may take minutes for large chains; scope with `canonry gbp sync <project> --location locations/XXX` to retry a subset.
182
+
183
+ ## Real-World Data Shapes & Signal Patterns
184
+
185
+ Validated against three live businesses of different types (a computer-support shop, a roofing contractor, and a Venice Beach hotel). Bake these into any parsing or analysis code.
186
+
187
+ ### Response-shape quirks (the parser MUST handle these)
188
+
189
+ - **Values are string-encoded integers.** Keyword counts come as `{ "insightsValue": { "value": "10939" } }` or `{ "insightsValue": { "threshold": "15" } }` — note the nesting under `insightsValue` and that `"10939"` is a string. `Number()` it.
190
+ - **Daily-metric zero days omit the value entirely.** A datedValue with no traffic is `{ "date": {"year":2026,"month":5,"day":1} }` — there is no `"value": "0"`. Treat a missing `value` as 0; don't skip the row.
191
+ - **Dates are split objects** (`{year, month, day}`), not ISO strings. Reassemble.
192
+
193
+ ### Signal patterns (what the data actually looks like)
194
+
195
+ - **`BUSINESS_DIRECTION_REQUESTS` is the most reliably-populated conversion signal** across every business type — even a tiny roofing contractor logged 66/30d while its website-clicks (2) and call-clicks (1) were near-zero. For local/service businesses it's the headline AEO-conversion proxy, not website clicks.
196
+ - **Most of the 11 daily metrics are all-zero** for non-retail businesses (`BUSINESS_CONVERSATIONS`, `BUSINESS_BOOKINGS`, `BUSINESS_FOOD_*` were 0 for all three). Syncing all 11 is fine (zeros are cheap) but the dashboard should hide all-zero series.
197
+ - **Impressions skew to Maps for physical-destination businesses.** The hotel pulled 7,402 desktop-maps impressions vs 2,257 desktop-search in 30 days — people find it on Maps.
198
+ - **Keyword thresholding scales with volume.** A busy hotel was ~89% thresholded (its head terms like `hotels`→10,939 had exact values); both small businesses were **100% thresholded** (every keyword redacted). For the typical SMB location, expect zero exact keyword values — design the UI to lead with the `<N` floor, not exact counts.
199
+ - **Empty lodging / place-action profiles are the norm, and the emptiness is the product.** A real operating hotel returned a lodging resource with only `{ "name": ... }` (no amenities) and zero place-action links. That gap — "AI engines have no structured amenity data or direct-booking CTA to cite" — is exactly what canonry should surface, not an error to suppress.
200
+
201
+ ## Troubleshooting
202
+
203
+ | Symptom | Likely cause | Fix |
204
+ |---|---|---|
205
+ | Every API call returns HTTP 403 | Access form not yet approved (0 QPM) | Submit the form; wait for approval email. Check quota in Cloud Console. |
206
+ | `redirect_uri_mismatch` during connect | OAuth client doesn't include the canonry callback URL | Add `http://localhost:53682/callback` (or the canonry-configured URL) to the OAuth client's authorized redirect URIs |
207
+ | "App not verified" warning at consent | Consent screen in test mode | Add the OAuth user to test users, or publish the consent screen |
208
+ | Empty accounts list after connect | OAuth user lacks manager access on any profile | Ask the profile owner to add the user at [business.google.com](https://business.google.com) → Users |
209
+ | Lodging endpoint returns 400 `FAILED_PRECONDITION` | Location primary category is not a lodging category | Update the primary category in the GBP UI to `Hotel`, `Resort`, `Motel`, etc. |
210
+ | Lodging returns 200 with only `{ "name": ... }` | The lodging profile has no amenities filled in | Not an error — it's an AEO gap to flag to the operator |
211
+ | Place action links empty | No CTAs configured | Set them up in the GBP UI; for many local businesses this is genuinely empty (an AEO gap) |
212
+ | Reviews 403 `SERVICE_DISABLED` while v1 APIs work | Legacy v4 `mybusiness.googleapis.com` not enabled for this account/project | See "The legacy Google My Business API" above — enable via the approval-email shortcut as the approved account; can't be done via library or gcloud |
213
+ | Q&A returns HTTP 501 `API_UNSUPPORTED` | Google shut down the Q&A API | Permanent — Q&A is not available programmatically |
214
+ | Keyword impressions mostly `threshold` instead of `value` | Low-volume keywords are privacy-redacted by Google | Expected — even a busy hotel can be ~89% thresholded; tiny businesses are 100%. Surfaced as `thresholdedKeywordPct` in the summary |
215
+
216
+ ## Related Files in This Skill
217
+
218
+ - `references/canonry-cli.md` — full CLI command reference
219
+ - `references/aeo-analysis.md` — interpretation patterns for citation and visibility data
@@ -1 +1 @@
1
- import{r as n,j as e}from"./vendor-tanstack-Dq7p98wZ.js";import{c as E,bm as O,Z as V,bn as U,bo as Y,bp as K,g as l,bq as L,T as f,B as z,i as A,a6 as Z,br as J,bs as X,a0 as ee,bt as se}from"./index-XUKhruAg.js";import{C as te,D as ae,T as ne,a as ce}from"./trash-2-B5clF2rU.js";import"./vendor-radix-B57xfQbP.js";import"./vendor-recharts-DWvKDyBF.js";import"./vendor-markdown-DK7fbRNb.js";const ie=[["circle",{cx:"12",cy:"12",r:"10",key:"1mglay"}],["line",{x1:"12",x2:"12",y1:"8",y2:"12",key:"1pkeuh"}],["line",{x1:"12",x2:"12.01",y1:"16",y2:"16",key:"4dfq90"}]],re=E("circle-alert",ie);const le=[["path",{d:"M15 3h6v6",key:"1q9fwt"}],["path",{d:"M10 14 21 3",key:"gplh6r"}],["path",{d:"M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6",key:"a6xqqp"}]],de=E("external-link",le);function h({children:t,label:m="More info",placement:a="top",className:d}){const o=n.useId(),[y,r]=n.useState(!1);return e.jsxs("span",{className:`relative inline-flex ${d??""}`,children:[e.jsx("button",{type:"button","aria-label":m,"aria-describedby":y?o:void 0,className:"inline-flex h-4 w-4 items-center justify-center rounded-full text-zinc-500 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500",onMouseEnter:()=>r(!0),onMouseLeave:()=>r(!1),onFocus:()=>r(!0),onBlur:()=>r(!1),children:e.jsx(ce,{className:"h-3.5 w-3.5","aria-hidden":!0})}),y&&e.jsx("span",{id:o,role:"tooltip",className:`absolute z-50 w-64 rounded border border-zinc-700 bg-zinc-900 px-3 py-2 text-xs font-normal leading-relaxed text-zinc-200 shadow-lg ${a==="top"?"bottom-full mb-2":"top-full mt-2"} left-1/2 -translate-x-1/2 whitespace-normal`,children:t})]})}const oe="https://commoncrawl.org/web-graphs";function w(t){return t==null?"—":t>=1e12?`${(t/1e12).toFixed(1)} TB`:t>=1e9?`${(t/1e9).toFixed(1)} GB`:t>=1e6?`${(t/1e6).toFixed(1)} MB`:t>=1e3?`${(t/1e3).toFixed(1)} KB`:`${t} B`}function b(t){if(!t)return"—";const m=Date.now()-new Date(t).getTime(),a=Math.floor(m/6e4);if(a<1)return"just now";if(a<60)return`${a}m ago`;const d=Math.floor(a/60);return d<24?`${d}h ago`:`${Math.floor(d/24)}d ago`}function v(t){switch(t){case"ready":return"positive";case"failed":return"negative";case"downloading":case"querying":case"queued":return"caution"}}function fe(){const[t,m]=n.useState(null),[a,d]=n.useState(null),[o,y]=n.useState([]),[r,M]=n.useState([]),[u,P]=n.useState(null),[D,C]=n.useState(!0),[S,B]=n.useState(!1),[p,R]=n.useState(!1),[N,g]=n.useState(""),[F,k]=n.useState(!1),[q,i]=n.useState(null),[I,x]=n.useState(null),j=n.useCallback(async()=>{C(!0),i(null);try{const[s,c,_,W,H]=await Promise.all([O(),V().catch(()=>null),U().catch(()=>[]),Y().catch(()=>[]),K().catch(()=>null)]);m(s),d(c),y(_),M(W),P(H)}catch(s){i(s instanceof Error?s.message:"Failed to load backlinks status")}finally{C(!1)}},[]);n.useEffect(()=>{j()},[j]);async function $(){B(!0),i(null),x(null);try{const s=await J();x(s.alreadyPresent?`DuckDB already installed (${s.version}).`:`Installed DuckDB ${s.version}.`),await j()}catch(s){i(s instanceof Error?s.message:"Failed to install DuckDB")}finally{B(!1)}}async function T(){const s=N.trim()||void 0;R(!0),i(null),x(null);try{const c=await X(s);x(s?`Queued sync for ${c.release}. Download + query runs in the background.`:`Queued sync for auto-discovered release ${c.release}. Download + query runs in the background.`),g(""),k(!1),await j()}catch(c){c instanceof ee&&c.code==="MISSING_DEPENDENCY"?i("DuckDB is not installed. Install it first."):i(c instanceof Error?c.message:"Failed to trigger sync")}finally{R(!1)}}async function Q(s){i(null),x(null);try{await se(s),x(`Pruned cached release ${s}.`),await j()}catch(c){i(c instanceof Error?c.message:"Failed to prune release")}}const G=a?.status==="ready"&&r.every(s=>s.release!==a.release);return e.jsxs("div",{className:"page-container",children:[e.jsx("div",{className:"page-header",children:e.jsxs("div",{className:"page-header-left",children:[e.jsx("h1",{className:"page-title",children:"Backlinks"}),e.jsx("p",{className:"page-subtitle",children:"Find domains that link to your projects, computed from the open Common Crawl web graph. Runs entirely on your machine — nothing is sent to third parties."})]})}),e.jsx(l,{className:"surface-card p-4 mb-6 border-amber-800/60",children:e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(L,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-sm text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Heads up — a release sync is a large download."}),e.jsxs("ul",{className:"mt-1.5 space-y-1 text-zinc-400",children:[e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"~16 GB"})," of gzipped vertex + edge files per release, stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),"."]}),e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"10–20 min on a fast connection"})," for the download, then ~5 min for the DuckDB query."]}),e.jsx("li",{children:"One sync covers every project in this workspace. Releases are immutable, so the download only happens once per release."})]})]})]})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"About"}),e.jsx("h2",{children:"How it works"})]})}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-sm text-zinc-400 leading-relaxed max-w-3xl mb-4",children:["Common Crawl publishes a quarterly snapshot of the public web’s hyperlink graph. Canonry downloads one"," ",e.jsx("span",{className:"text-zinc-200",children:"release"})," at a time and extracts backlinks for every project in this workspace in a single pass."]}),e.jsxs("ol",{className:"space-y-3 text-sm text-zinc-400 max-w-3xl",children:[e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"1"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Download (one-time, ~16 GB)"})," — vertex + edge files cached to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),". Runs once per release; subsequent operations reuse the cache."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"2"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Query (~5 min)"})," — one DuckDB pass scans the cached files and extracts referring domains for every project’s canonical domain. DuckDB is only used to ",e.jsx("span",{className:"text-zinc-200",children:"read"})," these dumps; it doesn’t store any canonry state."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"3"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Persist"})," — results land in the same SQLite database the rest of canonry uses. After the first sync, per-project reads (and re-run extracts against the cached release) are instant."]})]})]})]})]}),q&&e.jsx(l,{className:"surface-card p-4 mb-4 border-rose-800/60",children:e.jsx("p",{className:"text-sm text-rose-300",children:q})}),I&&e.jsx(l,{className:"surface-card p-4 mb-4 border-emerald-800/60",children:e.jsx("p",{className:"text-sm text-emerald-300",children:I})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Dependency"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["DuckDB install status",e.jsxs(h,{label:"Why DuckDB?",children:[e.jsx("span",{className:"block",children:"DuckDB is a query engine canonry uses to scan the ~16 GB Common Crawl dumps and pull out your referring domains."}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["It does ",e.jsx("span",{className:"text-zinc-200",children:"not"})," store any canonry data — your backlink results live in SQLite alongside the rest of your projects. DuckDB is purely a tool for processing the raw CSV files."]}),e.jsxs("span",{className:"mt-2 block text-zinc-500",children:["Installed on demand (not bundled) into ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/plugins/"})," so users who never run backlinks don’t pay the ~40 MB install cost."]})]})]})]}),t?.duckdbInstalled?e.jsx(f,{tone:"positive",children:"Installed"}):e.jsx(f,{tone:"caution",children:"Not installed"})]}),e.jsx(l,{className:"surface-card p-5",children:D?e.jsx("p",{className:"text-sm text-zinc-500",children:"Checking…"}):t?.duckdbInstalled?e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(te,{className:"h-5 w-5 text-emerald-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-sm text-zinc-200",children:["Version ",t.duckdbVersion??"unknown"," installed at"," ",e.jsx("code",{className:"text-zinc-300",children:t.pluginDir})]}),e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Required spec: ",t.duckdbSpec]})]})]}):e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(re,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"flex-1",children:[e.jsx("p",{className:"text-sm text-zinc-200",children:"DuckDB is not installed. It’s the query engine canonry uses to scan Common Crawl dumps — required before you can run a release sync or per-project extract."}),e.jsx("p",{className:"text-xs text-zinc-500 mt-1",children:"Installing doesn’t touch your project data. DuckDB only reads the downloaded CSV files; backlink results are written to the same SQLite database canonry already uses."}),t&&e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Will be installed into ",e.jsx("code",{className:"text-zinc-300",children:t.pluginDir})," (~40 MB)."]}),e.jsx("div",{className:"mt-3",children:e.jsxs(z,{type:"button",size:"sm",disabled:S,onClick:A($),children:[e.jsx(ae,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),S?"Installing…":"Install DuckDB"]})})]})]})})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Latest sync"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Release sync",e.jsx(h,{label:"What is a release sync?",children:"A release sync downloads one Common Crawl dump (~16 GB) and extracts backlinks for every project in this workspace in one pass. This is the heavy job — subsequent per-project re-runs skip the download and just re-query the cached files."})]})]}),a&&e.jsx(f,{tone:v(a.status),children:a.status})]}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-xs text-zinc-500 max-w-3xl mb-4",children:["A release is one Common Crawl dump (e.g. ",e.jsx("code",{className:"text-zinc-400",children:"cc-main-2026-jan-feb-mar"}),"). Syncing it downloads the graph and populates backlinks for every project in this workspace."]}),a?e.jsxs("div",{className:"space-y-2 text-sm",children:[e.jsxs("p",{className:"text-zinc-200",children:["Release ",e.jsx("code",{className:"text-zinc-300",children:a.release})]}),a.phaseDetail&&e.jsx("p",{className:"text-zinc-500",children:a.phaseDetail}),e.jsxs("div",{className:"grid grid-cols-2 md:grid-cols-4 gap-4 text-xs text-zinc-500 pt-2",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Projects"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:a.projectsProcessed??"—"})]}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-zinc-600 uppercase tracking-wide flex items-center gap-1",children:["Rows",e.jsx(h,{label:"What are rows?",children:"Total number of (project, referring domain) pairs persisted in SQLite from this sync, across every project in the workspace."})]}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:a.domainsDiscovered??"—"})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Started"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(a.downloadStartedAt??a.createdAt)})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Finished"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(a.queryFinishedAt)})]})]}),a.error&&e.jsx("p",{className:"text-sm text-rose-400 pt-2",children:a.error})]}):e.jsx("p",{className:"text-sm text-zinc-500",children:"No release sync has run in this workspace yet."}),G&&e.jsx("div",{className:"mt-4 rounded border border-amber-800/60 bg-amber-950/20 p-3",children:e.jsxs("div",{className:"flex items-start gap-2",children:[e.jsx(L,{className:"h-4 w-4 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-xs text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Cached files for this release are missing."}),e.jsxs("p",{className:"mt-1 text-zinc-400",children:["The sync record in the database says this release finished successfully, but the ~16 GB dump at"," ",e.jsxs("code",{className:"text-zinc-300",children:["~/.canonry/cache/commoncrawl/",a?.release,"/"]})," isn’t on disk. Your backlink data is still intact (it lives in SQLite), but per-project re-run extracts will fail until you either re-sync this release or start a new one."]})]})]})}),e.jsxs("div",{className:"mt-4 rounded border border-zinc-800 bg-zinc-900/40 p-3",children:[e.jsxs("div",{className:"flex items-start justify-between gap-3 mb-3",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-[10px] uppercase tracking-wide text-zinc-500",children:"Auto-detected release"}),u?e.jsxs("p",{className:"text-sm text-zinc-200 mt-0.5",children:[e.jsx("code",{className:"text-zinc-100",children:u.release}),e.jsxs("span",{className:"ml-2 text-xs text-zinc-500",children:["— vertex ",w(u.vertexBytes),", edges ",w(u.edgesBytes)]})]}):e.jsx("p",{className:"text-sm text-zinc-500 mt-0.5",children:D?"Probing Common Crawl…":"Could not auto-detect — pass an explicit release below."}),e.jsxs("a",{href:oe,target:"_blank",rel:"noopener noreferrer",className:"mt-1 inline-flex items-center gap-1 text-xs text-zinc-400 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",children:["Browse all Common Crawl web-graph releases",e.jsx(de,{className:"h-3 w-3","aria-hidden":!0})]})]}),e.jsxs("div",{className:"flex items-center gap-2 shrink-0",children:[e.jsxs(z,{type:"button",size:"sm",disabled:p||!t?.duckdbInstalled||!u&&!N.trim(),onClick:A(T),children:[e.jsx(Z,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),p?"Queuing…":"Run sync"]}),e.jsxs(h,{label:"What does Run sync do?",children:[e.jsxs("span",{className:"block",children:["Downloads the auto-detected (or chosen) Common Crawl release (~16 GB) to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),", then runs a single DuckDB query that extracts referring domains for every project in this workspace."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["First time for a release: ",e.jsx("span",{className:"text-zinc-200",children:"~10–20 min download + ~5 min query"}),". Re-running the same release later: ",e.jsx("span",{className:"text-zinc-200",children:"skips download, just re-queries"})," (~5 min)."]})]})]})]}),F?e.jsxs("div",{className:"flex flex-wrap items-center gap-2",children:[e.jsx("input",{type:"text",className:"flex-1 min-w-[240px] rounded border border-zinc-700 bg-transparent px-2.5 py-1.5 text-sm text-zinc-200 placeholder-zinc-600 focus:border-zinc-500 focus:outline-none",placeholder:"cc-main-2026-jan-feb-mar",value:N,onChange:s=>g(s.target.value),disabled:p,autoFocus:!0}),e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>{g(""),k(!1)},disabled:p,children:"Cancel"})]}):e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>k(!0),disabled:p,children:"Use a different release →"})]}),!t?.duckdbInstalled&&e.jsx("p",{className:"text-xs text-zinc-600 mt-2",children:"Install DuckDB first to enable sync."})]})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Cached releases"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Local disk cache",e.jsxs(h,{label:"What is this?",children:[e.jsxs("span",{className:"block",children:["Raw Common Crawl dumps stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/<release>/"}),". Each release takes ~16 GB."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["These files are needed to re-run per-project extracts against a release without re-downloading. Pruning here ",e.jsx("span",{className:"text-zinc-200",children:"does not delete your backlink data"})," — that lives in SQLite."]})]})]})]})}),e.jsx("p",{className:"text-xs text-zinc-500 mb-3 max-w-3xl",children:"Each cached release is a ~16 GB pair of gzipped files. They’re needed to re-query the graph (e.g. for a newly-added project) without re-downloading. Safe to prune — backlink results persist in SQLite."}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Sync status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Size"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Last used"}),e.jsx("th",{className:"px-4 py-2 font-medium sr-only",children:"Actions"})]})}),e.jsxs("tbody",{children:[r.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:s.syncStatus?e.jsx(f,{tone:v(s.syncStatus),children:s.syncStatus}):e.jsx("span",{className:"text-zinc-600",children:"—"})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:w(s.bytes)}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.lastUsedAt)}),e.jsx("td",{className:"px-4 py-2 text-right",children:e.jsxs("div",{className:"inline-flex items-center gap-1",children:[e.jsxs(z,{type:"button",variant:"outline",size:"sm",onClick:()=>{Q(s.release)},children:[e.jsx(ne,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),"Prune"]}),e.jsx(h,{label:"What does Prune do?",placement:"top",children:"Deletes the ~16 GB cache for this release from disk. Backlink results already in SQLite remain untouched. To re-run extracts against this release, you’d have to sync it again (another ~16 GB download)."})]})})]},s.release)),r.length===0&&e.jsx("tr",{children:e.jsx("td",{className:"px-4 py-4 text-sm text-zinc-500",colSpan:5,children:"No cached releases on this machine. If you ran a sync from a different machine (or deleted the cache), the backlink data is still in the database — but you’ll need to re-sync a release to run new extracts."})})]})]})})]}),o.length>1&&e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"History"}),e.jsx("h2",{children:"Past release syncs"})]})}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Projects"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Rows"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Finished"})]})}),e.jsx("tbody",{children:o.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:e.jsx(f,{tone:v(s.status),children:s.status})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.projectsProcessed??"—"}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.domainsDiscovered??"—"}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.queryFinishedAt??s.updatedAt)})]},s.id))})]})})]})]})}export{fe as BacklinksPage};
1
+ import{r as n,j as e}from"./vendor-tanstack-Dq7p98wZ.js";import{c as E,bm as O,Z as V,bn as U,bo as Y,bp as K,g as l,bq as L,T as f,B as z,i as A,a6 as Z,br as J,bs as X,a0 as ee,bt as se}from"./index-BrCh3uvb.js";import{C as te,D as ae,T as ne,a as ce}from"./trash-2-BgGGPjQf.js";import"./vendor-radix-B57xfQbP.js";import"./vendor-recharts-DWvKDyBF.js";import"./vendor-markdown-DK7fbRNb.js";const ie=[["circle",{cx:"12",cy:"12",r:"10",key:"1mglay"}],["line",{x1:"12",x2:"12",y1:"8",y2:"12",key:"1pkeuh"}],["line",{x1:"12",x2:"12.01",y1:"16",y2:"16",key:"4dfq90"}]],re=E("circle-alert",ie);const le=[["path",{d:"M15 3h6v6",key:"1q9fwt"}],["path",{d:"M10 14 21 3",key:"gplh6r"}],["path",{d:"M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6",key:"a6xqqp"}]],de=E("external-link",le);function h({children:t,label:m="More info",placement:a="top",className:d}){const o=n.useId(),[y,r]=n.useState(!1);return e.jsxs("span",{className:`relative inline-flex ${d??""}`,children:[e.jsx("button",{type:"button","aria-label":m,"aria-describedby":y?o:void 0,className:"inline-flex h-4 w-4 items-center justify-center rounded-full text-zinc-500 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500",onMouseEnter:()=>r(!0),onMouseLeave:()=>r(!1),onFocus:()=>r(!0),onBlur:()=>r(!1),children:e.jsx(ce,{className:"h-3.5 w-3.5","aria-hidden":!0})}),y&&e.jsx("span",{id:o,role:"tooltip",className:`absolute z-50 w-64 rounded border border-zinc-700 bg-zinc-900 px-3 py-2 text-xs font-normal leading-relaxed text-zinc-200 shadow-lg ${a==="top"?"bottom-full mb-2":"top-full mt-2"} left-1/2 -translate-x-1/2 whitespace-normal`,children:t})]})}const oe="https://commoncrawl.org/web-graphs";function w(t){return t==null?"—":t>=1e12?`${(t/1e12).toFixed(1)} TB`:t>=1e9?`${(t/1e9).toFixed(1)} GB`:t>=1e6?`${(t/1e6).toFixed(1)} MB`:t>=1e3?`${(t/1e3).toFixed(1)} KB`:`${t} B`}function b(t){if(!t)return"—";const m=Date.now()-new Date(t).getTime(),a=Math.floor(m/6e4);if(a<1)return"just now";if(a<60)return`${a}m ago`;const d=Math.floor(a/60);return d<24?`${d}h ago`:`${Math.floor(d/24)}d ago`}function v(t){switch(t){case"ready":return"positive";case"failed":return"negative";case"downloading":case"querying":case"queued":return"caution"}}function fe(){const[t,m]=n.useState(null),[a,d]=n.useState(null),[o,y]=n.useState([]),[r,M]=n.useState([]),[u,P]=n.useState(null),[D,C]=n.useState(!0),[S,B]=n.useState(!1),[p,R]=n.useState(!1),[N,g]=n.useState(""),[F,k]=n.useState(!1),[q,i]=n.useState(null),[I,x]=n.useState(null),j=n.useCallback(async()=>{C(!0),i(null);try{const[s,c,_,W,H]=await Promise.all([O(),V().catch(()=>null),U().catch(()=>[]),Y().catch(()=>[]),K().catch(()=>null)]);m(s),d(c),y(_),M(W),P(H)}catch(s){i(s instanceof Error?s.message:"Failed to load backlinks status")}finally{C(!1)}},[]);n.useEffect(()=>{j()},[j]);async function $(){B(!0),i(null),x(null);try{const s=await J();x(s.alreadyPresent?`DuckDB already installed (${s.version}).`:`Installed DuckDB ${s.version}.`),await j()}catch(s){i(s instanceof Error?s.message:"Failed to install DuckDB")}finally{B(!1)}}async function T(){const s=N.trim()||void 0;R(!0),i(null),x(null);try{const c=await X(s);x(s?`Queued sync for ${c.release}. Download + query runs in the background.`:`Queued sync for auto-discovered release ${c.release}. Download + query runs in the background.`),g(""),k(!1),await j()}catch(c){c instanceof ee&&c.code==="MISSING_DEPENDENCY"?i("DuckDB is not installed. Install it first."):i(c instanceof Error?c.message:"Failed to trigger sync")}finally{R(!1)}}async function Q(s){i(null),x(null);try{await se(s),x(`Pruned cached release ${s}.`),await j()}catch(c){i(c instanceof Error?c.message:"Failed to prune release")}}const G=a?.status==="ready"&&r.every(s=>s.release!==a.release);return e.jsxs("div",{className:"page-container",children:[e.jsx("div",{className:"page-header",children:e.jsxs("div",{className:"page-header-left",children:[e.jsx("h1",{className:"page-title",children:"Backlinks"}),e.jsx("p",{className:"page-subtitle",children:"Find domains that link to your projects, computed from the open Common Crawl web graph. Runs entirely on your machine — nothing is sent to third parties."})]})}),e.jsx(l,{className:"surface-card p-4 mb-6 border-amber-800/60",children:e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(L,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-sm text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Heads up — a release sync is a large download."}),e.jsxs("ul",{className:"mt-1.5 space-y-1 text-zinc-400",children:[e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"~16 GB"})," of gzipped vertex + edge files per release, stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),"."]}),e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"10–20 min on a fast connection"})," for the download, then ~5 min for the DuckDB query."]}),e.jsx("li",{children:"One sync covers every project in this workspace. Releases are immutable, so the download only happens once per release."})]})]})]})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"About"}),e.jsx("h2",{children:"How it works"})]})}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-sm text-zinc-400 leading-relaxed max-w-3xl mb-4",children:["Common Crawl publishes a quarterly snapshot of the public web’s hyperlink graph. Canonry downloads one"," ",e.jsx("span",{className:"text-zinc-200",children:"release"})," at a time and extracts backlinks for every project in this workspace in a single pass."]}),e.jsxs("ol",{className:"space-y-3 text-sm text-zinc-400 max-w-3xl",children:[e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"1"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Download (one-time, ~16 GB)"})," — vertex + edge files cached to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),". Runs once per release; subsequent operations reuse the cache."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"2"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Query (~5 min)"})," — one DuckDB pass scans the cached files and extracts referring domains for every project’s canonical domain. DuckDB is only used to ",e.jsx("span",{className:"text-zinc-200",children:"read"})," these dumps; it doesn’t store any canonry state."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"3"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Persist"})," — results land in the same SQLite database the rest of canonry uses. After the first sync, per-project reads (and re-run extracts against the cached release) are instant."]})]})]})]})]}),q&&e.jsx(l,{className:"surface-card p-4 mb-4 border-rose-800/60",children:e.jsx("p",{className:"text-sm text-rose-300",children:q})}),I&&e.jsx(l,{className:"surface-card p-4 mb-4 border-emerald-800/60",children:e.jsx("p",{className:"text-sm text-emerald-300",children:I})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Dependency"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["DuckDB install status",e.jsxs(h,{label:"Why DuckDB?",children:[e.jsx("span",{className:"block",children:"DuckDB is a query engine canonry uses to scan the ~16 GB Common Crawl dumps and pull out your referring domains."}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["It does ",e.jsx("span",{className:"text-zinc-200",children:"not"})," store any canonry data — your backlink results live in SQLite alongside the rest of your projects. DuckDB is purely a tool for processing the raw CSV files."]}),e.jsxs("span",{className:"mt-2 block text-zinc-500",children:["Installed on demand (not bundled) into ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/plugins/"})," so users who never run backlinks don’t pay the ~40 MB install cost."]})]})]})]}),t?.duckdbInstalled?e.jsx(f,{tone:"positive",children:"Installed"}):e.jsx(f,{tone:"caution",children:"Not installed"})]}),e.jsx(l,{className:"surface-card p-5",children:D?e.jsx("p",{className:"text-sm text-zinc-500",children:"Checking…"}):t?.duckdbInstalled?e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(te,{className:"h-5 w-5 text-emerald-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-sm text-zinc-200",children:["Version ",t.duckdbVersion??"unknown"," installed at"," ",e.jsx("code",{className:"text-zinc-300",children:t.pluginDir})]}),e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Required spec: ",t.duckdbSpec]})]})]}):e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(re,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"flex-1",children:[e.jsx("p",{className:"text-sm text-zinc-200",children:"DuckDB is not installed. It’s the query engine canonry uses to scan Common Crawl dumps — required before you can run a release sync or per-project extract."}),e.jsx("p",{className:"text-xs text-zinc-500 mt-1",children:"Installing doesn’t touch your project data. DuckDB only reads the downloaded CSV files; backlink results are written to the same SQLite database canonry already uses."}),t&&e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Will be installed into ",e.jsx("code",{className:"text-zinc-300",children:t.pluginDir})," (~40 MB)."]}),e.jsx("div",{className:"mt-3",children:e.jsxs(z,{type:"button",size:"sm",disabled:S,onClick:A($),children:[e.jsx(ae,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),S?"Installing…":"Install DuckDB"]})})]})]})})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Latest sync"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Release sync",e.jsx(h,{label:"What is a release sync?",children:"A release sync downloads one Common Crawl dump (~16 GB) and extracts backlinks for every project in this workspace in one pass. This is the heavy job — subsequent per-project re-runs skip the download and just re-query the cached files."})]})]}),a&&e.jsx(f,{tone:v(a.status),children:a.status})]}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-xs text-zinc-500 max-w-3xl mb-4",children:["A release is one Common Crawl dump (e.g. ",e.jsx("code",{className:"text-zinc-400",children:"cc-main-2026-jan-feb-mar"}),"). Syncing it downloads the graph and populates backlinks for every project in this workspace."]}),a?e.jsxs("div",{className:"space-y-2 text-sm",children:[e.jsxs("p",{className:"text-zinc-200",children:["Release ",e.jsx("code",{className:"text-zinc-300",children:a.release})]}),a.phaseDetail&&e.jsx("p",{className:"text-zinc-500",children:a.phaseDetail}),e.jsxs("div",{className:"grid grid-cols-2 md:grid-cols-4 gap-4 text-xs text-zinc-500 pt-2",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Projects"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:a.projectsProcessed??"—"})]}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-zinc-600 uppercase tracking-wide flex items-center gap-1",children:["Rows",e.jsx(h,{label:"What are rows?",children:"Total number of (project, referring domain) pairs persisted in SQLite from this sync, across every project in the workspace."})]}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:a.domainsDiscovered??"—"})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Started"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(a.downloadStartedAt??a.createdAt)})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Finished"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(a.queryFinishedAt)})]})]}),a.error&&e.jsx("p",{className:"text-sm text-rose-400 pt-2",children:a.error})]}):e.jsx("p",{className:"text-sm text-zinc-500",children:"No release sync has run in this workspace yet."}),G&&e.jsx("div",{className:"mt-4 rounded border border-amber-800/60 bg-amber-950/20 p-3",children:e.jsxs("div",{className:"flex items-start gap-2",children:[e.jsx(L,{className:"h-4 w-4 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-xs text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Cached files for this release are missing."}),e.jsxs("p",{className:"mt-1 text-zinc-400",children:["The sync record in the database says this release finished successfully, but the ~16 GB dump at"," ",e.jsxs("code",{className:"text-zinc-300",children:["~/.canonry/cache/commoncrawl/",a?.release,"/"]})," isn’t on disk. Your backlink data is still intact (it lives in SQLite), but per-project re-run extracts will fail until you either re-sync this release or start a new one."]})]})]})}),e.jsxs("div",{className:"mt-4 rounded border border-zinc-800 bg-zinc-900/40 p-3",children:[e.jsxs("div",{className:"flex items-start justify-between gap-3 mb-3",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-[10px] uppercase tracking-wide text-zinc-500",children:"Auto-detected release"}),u?e.jsxs("p",{className:"text-sm text-zinc-200 mt-0.5",children:[e.jsx("code",{className:"text-zinc-100",children:u.release}),e.jsxs("span",{className:"ml-2 text-xs text-zinc-500",children:["— vertex ",w(u.vertexBytes),", edges ",w(u.edgesBytes)]})]}):e.jsx("p",{className:"text-sm text-zinc-500 mt-0.5",children:D?"Probing Common Crawl…":"Could not auto-detect — pass an explicit release below."}),e.jsxs("a",{href:oe,target:"_blank",rel:"noopener noreferrer",className:"mt-1 inline-flex items-center gap-1 text-xs text-zinc-400 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",children:["Browse all Common Crawl web-graph releases",e.jsx(de,{className:"h-3 w-3","aria-hidden":!0})]})]}),e.jsxs("div",{className:"flex items-center gap-2 shrink-0",children:[e.jsxs(z,{type:"button",size:"sm",disabled:p||!t?.duckdbInstalled||!u&&!N.trim(),onClick:A(T),children:[e.jsx(Z,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),p?"Queuing…":"Run sync"]}),e.jsxs(h,{label:"What does Run sync do?",children:[e.jsxs("span",{className:"block",children:["Downloads the auto-detected (or chosen) Common Crawl release (~16 GB) to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),", then runs a single DuckDB query that extracts referring domains for every project in this workspace."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["First time for a release: ",e.jsx("span",{className:"text-zinc-200",children:"~10–20 min download + ~5 min query"}),". Re-running the same release later: ",e.jsx("span",{className:"text-zinc-200",children:"skips download, just re-queries"})," (~5 min)."]})]})]})]}),F?e.jsxs("div",{className:"flex flex-wrap items-center gap-2",children:[e.jsx("input",{type:"text",className:"flex-1 min-w-[240px] rounded border border-zinc-700 bg-transparent px-2.5 py-1.5 text-sm text-zinc-200 placeholder-zinc-600 focus:border-zinc-500 focus:outline-none",placeholder:"cc-main-2026-jan-feb-mar",value:N,onChange:s=>g(s.target.value),disabled:p,autoFocus:!0}),e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>{g(""),k(!1)},disabled:p,children:"Cancel"})]}):e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>k(!0),disabled:p,children:"Use a different release →"})]}),!t?.duckdbInstalled&&e.jsx("p",{className:"text-xs text-zinc-600 mt-2",children:"Install DuckDB first to enable sync."})]})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Cached releases"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Local disk cache",e.jsxs(h,{label:"What is this?",children:[e.jsxs("span",{className:"block",children:["Raw Common Crawl dumps stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/<release>/"}),". Each release takes ~16 GB."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["These files are needed to re-run per-project extracts against a release without re-downloading. Pruning here ",e.jsx("span",{className:"text-zinc-200",children:"does not delete your backlink data"})," — that lives in SQLite."]})]})]})]})}),e.jsx("p",{className:"text-xs text-zinc-500 mb-3 max-w-3xl",children:"Each cached release is a ~16 GB pair of gzipped files. They’re needed to re-query the graph (e.g. for a newly-added project) without re-downloading. Safe to prune — backlink results persist in SQLite."}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Sync status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Size"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Last used"}),e.jsx("th",{className:"px-4 py-2 font-medium sr-only",children:"Actions"})]})}),e.jsxs("tbody",{children:[r.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:s.syncStatus?e.jsx(f,{tone:v(s.syncStatus),children:s.syncStatus}):e.jsx("span",{className:"text-zinc-600",children:"—"})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:w(s.bytes)}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.lastUsedAt)}),e.jsx("td",{className:"px-4 py-2 text-right",children:e.jsxs("div",{className:"inline-flex items-center gap-1",children:[e.jsxs(z,{type:"button",variant:"outline",size:"sm",onClick:()=>{Q(s.release)},children:[e.jsx(ne,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),"Prune"]}),e.jsx(h,{label:"What does Prune do?",placement:"top",children:"Deletes the ~16 GB cache for this release from disk. Backlink results already in SQLite remain untouched. To re-run extracts against this release, you’d have to sync it again (another ~16 GB download)."})]})})]},s.release)),r.length===0&&e.jsx("tr",{children:e.jsx("td",{className:"px-4 py-4 text-sm text-zinc-500",colSpan:5,children:"No cached releases on this machine. If you ran a sync from a different machine (or deleted the cache), the backlink data is still in the database — but you’ll need to re-sync a release to run new extracts."})})]})]})})]}),o.length>1&&e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"History"}),e.jsx("h2",{children:"Past release syncs"})]})}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Projects"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Rows"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Finished"})]})}),e.jsx("tbody",{children:o.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:e.jsx(f,{tone:v(s.status),children:s.status})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.projectsProcessed??"—"}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.domainsDiscovered??"—"}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.queryFinishedAt??s.updatedAt)})]},s.id))})]})})]})]})}export{fe as BacklinksPage};
@@ -1 +1 @@
1
- import{c as o}from"./index-XUKhruAg.js";const n=[["path",{d:"m15 18-6-6 6-6",key:"1wnfg3"}]],c=o("chevron-left",n),s={contentStyle:{backgroundColor:"#18181b",border:"1px solid #3f3f46",borderRadius:8,fontSize:12},labelStyle:{color:"#e4e4e7"},itemStyle:{color:"#a1a1aa"}},i={fill:"#71717a",fontSize:11},f="#27272a",T="#27272a",d=["#34d399","#60a5fa","#f472b6","#facc15","#a78bfa","#fb923c","#22d3ee","#f87171"],S={text:"#a1a1aa",textDim:"#71717a",textFaint:"#52525b",surface:"#27272a"},l={positiveDeep:"#10b981"};function a(e){const t=String(e);return t.includes("T")?new Date(t):new Date(t+"T00:00:00")}function C(e){return a(String(e)).toLocaleDateString(void 0,{month:"short",day:"numeric",year:"numeric"})}function u(e){const t=a(e);return`${t.getMonth()+1}/${t.getDate()}`}export{T as C,i as a,C as b,s as c,S as d,d as e,u as f,l as g,c as h,f as i};
1
+ import{c as o}from"./index-BrCh3uvb.js";const n=[["path",{d:"m15 18-6-6 6-6",key:"1wnfg3"}]],c=o("chevron-left",n),s={contentStyle:{backgroundColor:"#18181b",border:"1px solid #3f3f46",borderRadius:8,fontSize:12},labelStyle:{color:"#e4e4e7"},itemStyle:{color:"#a1a1aa"}},i={fill:"#71717a",fontSize:11},f="#27272a",T="#27272a",d=["#34d399","#60a5fa","#f472b6","#facc15","#a78bfa","#fb923c","#22d3ee","#f87171"],S={text:"#a1a1aa",textDim:"#71717a",textFaint:"#52525b",surface:"#27272a"},l={positiveDeep:"#10b981"};function a(e){const t=String(e);return t.includes("T")?new Date(t):new Date(t+"T00:00:00")}function C(e){return a(String(e)).toLocaleDateString(void 0,{month:"short",day:"numeric",year:"numeric"})}function u(e){const t=a(e);return`${t.getMonth()+1}/${t.getDate()}`}export{T as C,i as a,C as b,s as c,S as d,d as e,u as f,l as g,c as h,f as i};