@ainyc/canonry 4.72.3 → 4.74.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/assets/agent-workspace/skills/canonry/references/canonry-cli.md +17 -0
  2. package/assets/agent-workspace/skills/canonry/references/server-side-traffic.md +56 -3
  3. package/assets/assets/{BacklinksPage-CjfpwZEH.js → BacklinksPage-ClgP7CUd.js} +1 -1
  4. package/assets/assets/{ChartPrimitives-Ckf2FrUy.js → ChartPrimitives-PGDrQBXP.js} +1 -1
  5. package/assets/assets/ProjectPage-xfLeh2vB.js +6 -0
  6. package/assets/assets/{RunRow-BuFyG0V_.js → RunRow-DL-lUm35.js} +1 -1
  7. package/assets/assets/{RunsPage-D-pr000K.js → RunsPage-BCL_lU-R.js} +1 -1
  8. package/assets/assets/{SettingsPage-CiaapCYn.js → SettingsPage-D67UQYJa.js} +1 -1
  9. package/assets/assets/{TrafficPage-B40xytJD.js → TrafficPage-DVRcPxCk.js} +1 -1
  10. package/assets/assets/{TrafficSourceDetailPage-7hHem-gM.js → TrafficSourceDetailPage-JzX1fhGQ.js} +1 -1
  11. package/assets/assets/{extract-error-message-3GkDsu1h.js → extract-error-message-Cia_CilL.js} +1 -1
  12. package/assets/assets/index-CFVX11lK.css +1 -0
  13. package/assets/assets/{index-BVdH2O9w.js → index-DHg9_-PB.js} +118 -118
  14. package/assets/assets/{server-traffic-CsgPsudZ.js → server-traffic-GBmLS3L7.js} +1 -1
  15. package/assets/assets/{trash-2-B8Ipf9rI.js → trash-2-Bk7PYGBN.js} +1 -1
  16. package/assets/index.html +2 -2
  17. package/dist/{chunk-JXFNERK4.js → chunk-A7JX3FZB.js} +1094 -995
  18. package/dist/{chunk-SIB4NMEH.js → chunk-MRC4JMIH.js} +369 -176
  19. package/dist/{chunk-ZUBBADMR.js → chunk-W6GBIRFA.js} +162 -1
  20. package/dist/{chunk-HSX32G47.js → chunk-ZRZHIS22.js} +414 -73
  21. package/dist/cli.js +236 -30
  22. package/dist/index.js +4 -4
  23. package/dist/{intelligence-service-ZW3ARLJT.js → intelligence-service-GPO2VMEC.js} +2 -2
  24. package/dist/mcp.js +2 -2
  25. package/package.json +8 -8
  26. package/assets/assets/ProjectPage-DZeplYeC.js +0 -6
  27. package/assets/assets/index-B3nENtU0.css +0 -1
@@ -167,6 +167,22 @@ cnry sources <project> --rank --format jsonl # stream the ranked domains, one
167
167
  - The ranked list is **not truncated** by default (the old top-5-per-category cap is gone). Pass `--limit N` to cap each list; the response carries `truncatedDomainCount` / `truncatedCitedSlots` so totals always reconcile.
168
168
  - Counts are **cited slots** (grounding citations), so a domain cited 3× in one answer counts 3. Probe runs are excluded.
169
169
 
170
+ ## Technical AEO (site audit)
171
+
172
+ Site-wide technical audit (structured data, AI-readable content, AI-crawler access, content depth/freshness/extractability, …) powered by `@ainyc/aeo-audit`'s `runSitemapAudit`. Runs as the `site-audit` run kind — crawls the project's `sitemap.xml` and scores every reachable page, then rolls up into one 0–100 site score. Pure HTTP, no LLM cost; a large site can take minutes, so it runs in the background.
173
+
174
+ ```bash
175
+ cnry technical-aeo run <project> --wait # crawl + audit; --wait polls to terminal. Idempotent: returns the in-flight run if one is active.
176
+ cnry technical-aeo run <project> --sitemap-url <url> --limit 200 # override the sitemap / cap pages (highest <priority> first; default 500, max 2000)
177
+ cnry technical-aeo score <project> [--format json] # site score + per-factor scorecard (avg + pass/partial/fail per page) + delta vs the previous audit
178
+ cnry technical-aeo pages <project> [--status error] [--sort score-asc|score-desc|url] [--format json|jsonl] # per-page breakdown of the latest run (worst-first by default)
179
+ cnry technical-aeo trend <project> [--format json|jsonl] # aggregate-score history across past audits
180
+ cnry schedule set <project> --kind site-audit --preset weekly # keep it fresh
181
+ ```
182
+
183
+ - The score is only available after at least one audit runs — `score` returns `hasData: false` until then.
184
+ - Reads reflect the latest **completed/partial** site-audit run; probe runs are excluded.
185
+
170
186
  ## Intelligence
171
187
 
172
188
  ```bash
@@ -204,6 +220,7 @@ cnry schedule set <project> --preset daily # or: weekly, twice-daily, daily@
204
220
  cnry schedule set <project> --cron "0 9 * * *" --timezone America/New_York
205
221
  cnry schedule set <project> --kind data-refresh --preset daily # refresh all connected GSC/Bing/GA/GBP integrations (no --source)
206
222
  cnry schedule set <project> --kind backlinks-sync --preset weekly # re-probe Common Crawl; sync only when a newer rolling window is published (no --source/--provider)
223
+ cnry schedule set <project> --kind site-audit --preset weekly # Technical AEO: crawl the sitemap + audit every page (no --source/--provider)
207
224
  cnry schedule show <project>
208
225
  cnry schedule enable <project>
209
226
  cnry schedule disable <project>
@@ -66,8 +66,45 @@ sync flow does NOT echo the private key back in any response.
66
66
  ## Connecting a WordPress source
67
67
 
68
68
  The WordPress adapter pulls events from the **Canonry Traffic Logger**
69
- WordPress plugin, which captures every non-admin GET page-load and
70
- exposes a paginated REST endpoint protected by an Application Password.
69
+ WordPress plugin, which captures every non-admin GET page-load **that
70
+ reaches PHP** and exposes a paginated REST endpoint protected by an
71
+ Application Password.
72
+
73
+ > **Cache blind spot.** The plugin is a PHP hook, so it only sees
74
+ > requests that execute WordPress. A full-page cache (LiteSpeed, WP
75
+ > Rocket, W3 Total Cache, WP Super Cache) or CDN serves cached pages
76
+ > before PHP runs, so cache-served page views, including live AI
77
+ > user-fetches (Claude-User, ChatGPT-User), are NOT logged. Bot crawls
78
+ > of uncached endpoints (sitemap, feeds, assets, cache misses) still
79
+ > come through, which can make capture look healthy while real page
80
+ > views go uncounted. Exclude AI user-agents from the cache (and any
81
+ > CDN), or capture from access/edge logs instead. The
82
+ > `traffic.source.cache-blindspot` doctor check warns whenever a
83
+ > WordPress source is connected.
84
+
85
+ **Which user-agents to exclude from the cache** (one per line in
86
+ LiteSpeed's "Do Not Cache User Agents", WP Rocket's
87
+ `rocket_cache_reject_ua`, or W3TC / WP Super Cache "Rejected User
88
+ Agents"):
89
+
90
+ ```
91
+ Claude-User
92
+ ClaudeBot
93
+ ChatGPT-User
94
+ OAI-SearchBot
95
+ GPTBot
96
+ PerplexityBot
97
+ Perplexity-User
98
+ ```
99
+
100
+ These are the answer-engine fetchers in both live-user-fetch (`*-User`)
101
+ and crawler forms. Do NOT add `Googlebot` or `Bingbot`: caching helps
102
+ search crawlers (page speed is a ranking signal, and cached pages let
103
+ them crawl more per visit, which matters most on crawl-budget-starved
104
+ sites), and their crawl stats are already authoritative in GSC and Bing
105
+ Webmaster Tools. Rule of thumb: bypass cache only for agents you cannot
106
+ measure elsewhere and that gain nothing from being cached. Answer-engine
107
+ fetchers fit both; search crawlers fit neither.
71
108
 
72
109
  ```bash
73
110
  # 1. Install the plugin. Download the latest release zip from the
@@ -223,7 +260,7 @@ MCP `canonry_traffic_status` tool.
223
260
  | Project dashboard `/projects/:name/activity` | Live source table + 24h totals + GA4 referrals (combined view) |
224
261
  | Top-level `/traffic` route | Cross-project source admin (connect, sync, archive) |
225
262
  | `cnry report <project>` (HTML + SPA) | "AI Visibility — Server-Side" section, ranked above Indexing Health |
226
- | `cnry doctor --project <name>` | `traffic.source.connected`, `recent-data`, `credentials`, `scopes` checks |
263
+ | `cnry doctor --project <name>` | `traffic.source.connected`, `recent-data`, `credentials`, `scopes`, `cache-blindspot` checks |
227
264
  | MCP toolkit `traffic` | Tools: `canonry_traffic_status`, `_sources_list`, `_source_get`, `_events`, `_connect_cloud_run`, `_sync` |
228
265
 
229
266
  ## Doctor signals
@@ -237,6 +274,7 @@ The doctor checks are adapter-agnostic. When they fail or warn:
237
274
  | `traffic.source.recent-data` | `traffic.recent-data.stale` | Last sync was >7d ago. Run `cnry traffic sync …` or schedule a recurring sync. |
238
275
  | `traffic.source.recent-data` | `traffic.recent-data.empty` | Source connected but no data in 30d. Verify config and credentials with `cnry traffic sources <project>`. |
239
276
  | `traffic.source.credentials` | `traffic.credentials.resolve-failed` | Service-account key in `~/.canonry/config.yaml` is invalid or expired. Re-connect. |
277
+ | `traffic.source.cache-blindspot` | `traffic.cache-blindspot.wordpress-plugin` | A WordPress source is connected, so the plugin cannot see cache-served page views. Exclude AI user-agents from the page cache and any CDN, or switch to a log/edge source. Warns only, not a failure. |
240
278
 
241
279
  ## Scheduling
242
280
 
@@ -276,6 +314,21 @@ domains, or PII are surfaced.
276
314
 
277
315
  ## Limits & caveats
278
316
 
317
+ - **The WordPress plugin is blind to cache-served traffic.** The
318
+ `wordpress` adapter logs only requests that reach PHP. A full-page
319
+ cache or CDN serves cached pages from the edge, so cache-served page
320
+ views, including live AI user-fetches (Claude-User, ChatGPT-User),
321
+ never reach the plugin and go uncounted, even though bot crawls of
322
+ uncached endpoints (sitemap, assets) still appear. On a cached
323
+ WordPress site, treat the plugin's page-view counts as a floor, not a
324
+ total. Either exclude AI user-agents from the cache + CDN, or capture
325
+ cache-independent via a `cloud-run` / `vercel` / edge-log source. The
326
+ `traffic.source.cache-blindspot` doctor check surfaces this. Adapter
327
+ coverage differs: `vercel` ingests edge request-logs so cache hits are
328
+ captured (it records the `cache` HIT/MISS label), and `cloud-run` logs
329
+ every request that reaches the service, missing only what a CDN placed
330
+ in front of Cloud Run serves from its own edge cache. Only the
331
+ hook-based `wordpress` adapter has the always-present blind spot.
279
332
  - **Path-level citation cross-reference is not implemented yet.** The
280
333
  citation store is domain-grain (`query_snapshots.cited_domains`). A
281
334
  future iteration that lands URL-grain citation evidence will extend
@@ -1 +1 @@
1
- import{r as n,j as e}from"./vendor-tanstack-Dq7p98wZ.js";import{c as E,bC as O,ac as V,bD as U,bE as Y,bF as J,g as l,bG as L,T as f,B as z,i as A,al as K,bH as X,bI as Z,af as ee,bJ as se}from"./index-BVdH2O9w.js";import{C as ae,D as te,T as ne,a as ce}from"./trash-2-B8Ipf9rI.js";import"./vendor-radix-B57xfQbP.js";import"./vendor-recharts-ClRVR6aX.js";import"./vendor-markdown-DK7fbRNb.js";const ie=[["circle",{cx:"12",cy:"12",r:"10",key:"1mglay"}],["line",{x1:"12",x2:"12",y1:"8",y2:"12",key:"1pkeuh"}],["line",{x1:"12",x2:"12.01",y1:"16",y2:"16",key:"4dfq90"}]],re=E("circle-alert",ie);const le=[["path",{d:"M15 3h6v6",key:"1q9fwt"}],["path",{d:"M10 14 21 3",key:"gplh6r"}],["path",{d:"M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6",key:"a6xqqp"}]],de=E("external-link",le);function h({children:a,label:m="More info",placement:t="top",className:d}){const o=n.useId(),[y,r]=n.useState(!1);return e.jsxs("span",{className:`relative inline-flex ${d??""}`,children:[e.jsx("button",{type:"button","aria-label":m,"aria-describedby":y?o:void 0,className:"inline-flex h-4 w-4 items-center justify-center rounded-full text-zinc-500 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500",onMouseEnter:()=>r(!0),onMouseLeave:()=>r(!1),onFocus:()=>r(!0),onBlur:()=>r(!1),children:e.jsx(ce,{className:"h-3.5 w-3.5","aria-hidden":!0})}),y&&e.jsx("span",{id:o,role:"tooltip",className:`absolute z-50 w-64 rounded border border-zinc-700 bg-zinc-900 px-3 py-2 text-xs font-normal leading-relaxed text-zinc-200 shadow-lg ${t==="top"?"bottom-full mb-2":"top-full mt-2"} left-1/2 -translate-x-1/2 whitespace-normal`,children:a})]})}const oe="https://commoncrawl.org/web-graphs";function w(a){return a==null?"—":a>=1e12?`${(a/1e12).toFixed(1)} TB`:a>=1e9?`${(a/1e9).toFixed(1)} GB`:a>=1e6?`${(a/1e6).toFixed(1)} MB`:a>=1e3?`${(a/1e3).toFixed(1)} KB`:`${a} B`}function b(a){if(!a)return"—";const m=Date.now()-new Date(a).getTime(),t=Math.floor(m/6e4);if(t<1)return"just now";if(t<60)return`${t}m ago`;const d=Math.floor(t/60);return d<24?`${d}h ago`:`${Math.floor(d/24)}d ago`}function v(a){switch(a){case"ready":return"positive";case"failed":return"negative";case"downloading":case"querying":case"queued":return"caution"}}function fe(){const[a,m]=n.useState(null),[t,d]=n.useState(null),[o,y]=n.useState([]),[r,F]=n.useState([]),[u,M]=n.useState(null),[D,C]=n.useState(!0),[S,B]=n.useState(!1),[p,R]=n.useState(!1),[N,g]=n.useState(""),[P,k]=n.useState(!1),[I,i]=n.useState(null),[q,x]=n.useState(null),j=n.useCallback(async()=>{C(!0),i(null);try{const[s,c,_,H,W]=await Promise.all([O(),V().catch(()=>null),U().catch(()=>[]),Y().catch(()=>[]),J().catch(()=>null)]);m(s),d(c),y(_),F(H),M(W)}catch(s){i(s instanceof Error?s.message:"Failed to load backlinks status")}finally{C(!1)}},[]);n.useEffect(()=>{j()},[j]);async function $(){B(!0),i(null),x(null);try{const s=await X();x(s.alreadyPresent?`DuckDB already installed (${s.version}).`:`Installed DuckDB ${s.version}.`),await j()}catch(s){i(s instanceof Error?s.message:"Failed to install DuckDB")}finally{B(!1)}}async function T(){const s=N.trim()||void 0;R(!0),i(null),x(null);try{const c=await Z(s);x(s?`Queued sync for ${c.release}. Download + query runs in the background.`:`Queued sync for auto-discovered release ${c.release}. Download + query runs in the background.`),g(""),k(!1),await j()}catch(c){c instanceof ee&&c.code==="MISSING_DEPENDENCY"?i("DuckDB is not installed. Install it first."):i(c instanceof Error?c.message:"Failed to trigger sync")}finally{R(!1)}}async function G(s){i(null),x(null);try{await se(s),x(`Pruned cached release ${s}.`),await j()}catch(c){i(c instanceof Error?c.message:"Failed to prune release")}}const Q=t?.status==="ready"&&r.every(s=>s.release!==t.release);return e.jsxs("div",{className:"page-container",children:[e.jsx("div",{className:"page-header",children:e.jsxs("div",{className:"page-header-left",children:[e.jsx("h1",{className:"page-title",children:"Backlinks"}),e.jsx("p",{className:"page-subtitle",children:"Find domains that link to your projects, computed from the open Common Crawl web graph. Runs entirely on your machine — nothing is sent to third parties."})]})}),e.jsx(l,{className:"surface-card p-4 mb-6 border-amber-800/60",children:e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(L,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-sm text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Heads up — a release sync is a large download."}),e.jsxs("ul",{className:"mt-1.5 space-y-1 text-zinc-400",children:[e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"~16 GB"})," of gzipped vertex + edge files per release, stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),"."]}),e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"10–20 min on a fast connection"})," for the download, then ~5 min for the DuckDB query."]}),e.jsx("li",{children:"One sync covers every project in this workspace. Releases are immutable, so the download only happens once per release."})]})]})]})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"About"}),e.jsx("h2",{children:"How it works"})]})}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-sm text-zinc-400 leading-relaxed max-w-3xl mb-4",children:["Common Crawl publishes a quarterly snapshot of the public web’s hyperlink graph. Canonry downloads one"," ",e.jsx("span",{className:"text-zinc-200",children:"release"})," at a time and extracts backlinks for every project in this workspace in a single pass."]}),e.jsxs("ol",{className:"space-y-3 text-sm text-zinc-400 max-w-3xl",children:[e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"1"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Download (one-time, ~16 GB)"})," — vertex + edge files cached to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),". Runs once per release; subsequent operations reuse the cache."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"2"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Query (~5 min)"})," — one DuckDB pass scans the cached files and extracts referring domains for every project’s canonical domain. DuckDB is only used to ",e.jsx("span",{className:"text-zinc-200",children:"read"})," these dumps; it doesn’t store any canonry state."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"3"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Persist"})," — results land in the same SQLite database the rest of canonry uses. After the first sync, per-project reads (and re-run extracts against the cached release) are instant."]})]})]})]})]}),I&&e.jsx(l,{className:"surface-card p-4 mb-4 border-rose-800/60",children:e.jsx("p",{className:"text-sm text-rose-300",children:I})}),q&&e.jsx(l,{className:"surface-card p-4 mb-4 border-emerald-800/60",children:e.jsx("p",{className:"text-sm text-emerald-300",children:q})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Dependency"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["DuckDB install status",e.jsxs(h,{label:"Why DuckDB?",children:[e.jsx("span",{className:"block",children:"DuckDB is a query engine canonry uses to scan the ~16 GB Common Crawl dumps and pull out your referring domains."}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["It does ",e.jsx("span",{className:"text-zinc-200",children:"not"})," store any canonry data — your backlink results live in SQLite alongside the rest of your projects. DuckDB is purely a tool for processing the raw CSV files."]}),e.jsxs("span",{className:"mt-2 block text-zinc-500",children:["Installed on demand (not bundled) into ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/plugins/"})," so users who never run backlinks don’t pay the ~40 MB install cost."]})]})]})]}),a?.duckdbInstalled?e.jsx(f,{tone:"positive",children:"Installed"}):e.jsx(f,{tone:"caution",children:"Not installed"})]}),e.jsx(l,{className:"surface-card p-5",children:D?e.jsx("p",{className:"text-sm text-zinc-500",children:"Checking…"}):a?.duckdbInstalled?e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(ae,{className:"h-5 w-5 text-emerald-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-sm text-zinc-200",children:["Version ",a.duckdbVersion??"unknown"," installed at"," ",e.jsx("code",{className:"text-zinc-300",children:a.pluginDir})]}),e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Required spec: ",a.duckdbSpec]})]})]}):e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(re,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"flex-1",children:[e.jsx("p",{className:"text-sm text-zinc-200",children:"DuckDB is not installed. It’s the query engine canonry uses to scan Common Crawl dumps — required before you can run a release sync or per-project extract."}),e.jsx("p",{className:"text-xs text-zinc-500 mt-1",children:"Installing doesn’t touch your project data. DuckDB only reads the downloaded CSV files; backlink results are written to the same SQLite database canonry already uses."}),a&&e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Will be installed into ",e.jsx("code",{className:"text-zinc-300",children:a.pluginDir})," (~40 MB)."]}),e.jsx("div",{className:"mt-3",children:e.jsxs(z,{type:"button",size:"sm",disabled:S,onClick:A($),children:[e.jsx(te,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),S?"Installing…":"Install DuckDB"]})})]})]})})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Latest sync"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Release sync",e.jsx(h,{label:"What is a release sync?",children:"A release sync downloads one Common Crawl dump (~16 GB) and extracts backlinks for every project in this workspace in one pass. This is the heavy job — subsequent per-project re-runs skip the download and just re-query the cached files."})]})]}),t&&e.jsx(f,{tone:v(t.status),children:t.status})]}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-xs text-zinc-500 max-w-3xl mb-4",children:["A release is one Common Crawl dump (e.g. ",e.jsx("code",{className:"text-zinc-400",children:"cc-main-2026-jan-feb-mar"}),"). Syncing it downloads the graph and populates backlinks for every project in this workspace."]}),t?e.jsxs("div",{className:"space-y-2 text-sm",children:[e.jsxs("p",{className:"text-zinc-200",children:["Release ",e.jsx("code",{className:"text-zinc-300",children:t.release})]}),t.phaseDetail&&e.jsx("p",{className:"text-zinc-500",children:t.phaseDetail}),e.jsxs("div",{className:"grid grid-cols-2 md:grid-cols-4 gap-4 text-xs text-zinc-500 pt-2",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Projects"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:t.projectsProcessed??"—"})]}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-zinc-600 uppercase tracking-wide flex items-center gap-1",children:["Rows",e.jsx(h,{label:"What are rows?",children:"Total number of (project, referring domain) pairs persisted in SQLite from this sync, across every project in the workspace."})]}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:t.domainsDiscovered??"—"})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Started"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(t.downloadStartedAt??t.createdAt)})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Finished"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(t.queryFinishedAt)})]})]}),t.error&&e.jsx("p",{className:"text-sm text-rose-400 pt-2",children:t.error})]}):e.jsx("p",{className:"text-sm text-zinc-500",children:"No release sync has run in this workspace yet."}),Q&&e.jsx("div",{className:"mt-4 rounded border border-amber-800/60 bg-amber-950/20 p-3",children:e.jsxs("div",{className:"flex items-start gap-2",children:[e.jsx(L,{className:"h-4 w-4 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-xs text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Cached files for this release are missing."}),e.jsxs("p",{className:"mt-1 text-zinc-400",children:["The sync record in the database says this release finished successfully, but the ~16 GB dump at"," ",e.jsxs("code",{className:"text-zinc-300",children:["~/.canonry/cache/commoncrawl/",t?.release,"/"]})," isn’t on disk. Your backlink data is still intact (it lives in SQLite), but per-project re-run extracts will fail until you either re-sync this release or start a new one."]})]})]})}),e.jsxs("div",{className:"mt-4 rounded border border-zinc-800 bg-zinc-900/40 p-3",children:[e.jsxs("div",{className:"flex items-start justify-between gap-3 mb-3",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-[10px] uppercase tracking-wide text-zinc-500",children:"Auto-detected release"}),u?e.jsxs("p",{className:"text-sm text-zinc-200 mt-0.5",children:[e.jsx("code",{className:"text-zinc-100",children:u.release}),e.jsxs("span",{className:"ml-2 text-xs text-zinc-500",children:["— vertex ",w(u.vertexBytes),", edges ",w(u.edgesBytes)]})]}):e.jsx("p",{className:"text-sm text-zinc-500 mt-0.5",children:D?"Probing Common Crawl…":"Could not auto-detect — pass an explicit release below."}),e.jsxs("a",{href:oe,target:"_blank",rel:"noopener noreferrer",className:"mt-1 inline-flex items-center gap-1 text-xs text-zinc-400 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",children:["Browse all Common Crawl web-graph releases",e.jsx(de,{className:"h-3 w-3","aria-hidden":!0})]})]}),e.jsxs("div",{className:"flex items-center gap-2 shrink-0",children:[e.jsxs(z,{type:"button",size:"sm",disabled:p||!a?.duckdbInstalled||!u&&!N.trim(),onClick:A(T),children:[e.jsx(K,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),p?"Queuing…":"Run sync"]}),e.jsxs(h,{label:"What does Run sync do?",children:[e.jsxs("span",{className:"block",children:["Downloads the auto-detected (or chosen) Common Crawl release (~16 GB) to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),", then runs a single DuckDB query that extracts referring domains for every project in this workspace."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["First time for a release: ",e.jsx("span",{className:"text-zinc-200",children:"~10–20 min download + ~5 min query"}),". Re-running the same release later: ",e.jsx("span",{className:"text-zinc-200",children:"skips download, just re-queries"})," (~5 min)."]})]})]})]}),P?e.jsxs("div",{className:"flex flex-wrap items-center gap-2",children:[e.jsx("input",{type:"text",className:"flex-1 min-w-[240px] rounded border border-zinc-700 bg-transparent px-2.5 py-1.5 text-sm text-zinc-200 placeholder-zinc-600 focus:border-zinc-500 focus:outline-none",placeholder:"cc-main-2026-jan-feb-mar",value:N,onChange:s=>g(s.target.value),disabled:p,autoFocus:!0}),e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>{g(""),k(!1)},disabled:p,children:"Cancel"})]}):e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>k(!0),disabled:p,children:"Use a different release →"})]}),!a?.duckdbInstalled&&e.jsx("p",{className:"text-xs text-zinc-600 mt-2",children:"Install DuckDB first to enable sync."})]})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Cached releases"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Local disk cache",e.jsxs(h,{label:"What is this?",children:[e.jsxs("span",{className:"block",children:["Raw Common Crawl dumps stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/<release>/"}),". Each release takes ~16 GB."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["These files are needed to re-run per-project extracts against a release without re-downloading. Pruning here ",e.jsx("span",{className:"text-zinc-200",children:"does not delete your backlink data"})," — that lives in SQLite."]})]})]})]})}),e.jsx("p",{className:"text-xs text-zinc-500 mb-3 max-w-3xl",children:"Each cached release is a ~16 GB pair of gzipped files. They’re needed to re-query the graph (e.g. for a newly-added project) without re-downloading. Safe to prune — backlink results persist in SQLite."}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Sync status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Size"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Last used"}),e.jsx("th",{className:"px-4 py-2 font-medium sr-only",children:"Actions"})]})}),e.jsxs("tbody",{children:[r.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:s.syncStatus?e.jsx(f,{tone:v(s.syncStatus),children:s.syncStatus}):e.jsx("span",{className:"text-zinc-600",children:"—"})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:w(s.bytes)}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.lastUsedAt)}),e.jsx("td",{className:"px-4 py-2 text-right",children:e.jsxs("div",{className:"inline-flex items-center gap-1",children:[e.jsxs(z,{type:"button",variant:"outline",size:"sm",onClick:()=>{G(s.release)},children:[e.jsx(ne,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),"Prune"]}),e.jsx(h,{label:"What does Prune do?",placement:"top",children:"Deletes the ~16 GB cache for this release from disk. Backlink results already in SQLite remain untouched. To re-run extracts against this release, you’d have to sync it again (another ~16 GB download)."})]})})]},s.release)),r.length===0&&e.jsx("tr",{children:e.jsx("td",{className:"px-4 py-4 text-sm text-zinc-500",colSpan:5,children:"No cached releases on this machine. If you ran a sync from a different machine (or deleted the cache), the backlink data is still in the database — but you’ll need to re-sync a release to run new extracts."})})]})]})})]}),o.length>1&&e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"History"}),e.jsx("h2",{children:"Past release syncs"})]})}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Projects"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Rows"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Finished"})]})}),e.jsx("tbody",{children:o.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:e.jsx(f,{tone:v(s.status),children:s.status})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.projectsProcessed??"—"}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.domainsDiscovered??"—"}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.queryFinishedAt??s.updatedAt)})]},s.id))})]})})]})]})}export{fe as BacklinksPage};
1
+ import{r as n,j as e}from"./vendor-tanstack-Dq7p98wZ.js";import{c as E,bL as O,ac as V,bM as U,bN as Y,bO as K,g as l,aE as L,T as f,B as z,i as A,al as J,bP as X,bQ as Z,af as ee,bR as se}from"./index-DHg9_-PB.js";import{C as ae,D as te,T as ne,a as ce}from"./trash-2-Bk7PYGBN.js";import"./vendor-radix-B57xfQbP.js";import"./vendor-recharts-ClRVR6aX.js";import"./vendor-markdown-DK7fbRNb.js";const ie=[["circle",{cx:"12",cy:"12",r:"10",key:"1mglay"}],["line",{x1:"12",x2:"12",y1:"8",y2:"12",key:"1pkeuh"}],["line",{x1:"12",x2:"12.01",y1:"16",y2:"16",key:"4dfq90"}]],re=E("circle-alert",ie);const le=[["path",{d:"M15 3h6v6",key:"1q9fwt"}],["path",{d:"M10 14 21 3",key:"gplh6r"}],["path",{d:"M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6",key:"a6xqqp"}]],de=E("external-link",le);function h({children:a,label:m="More info",placement:t="top",className:d}){const o=n.useId(),[y,r]=n.useState(!1);return e.jsxs("span",{className:`relative inline-flex ${d??""}`,children:[e.jsx("button",{type:"button","aria-label":m,"aria-describedby":y?o:void 0,className:"inline-flex h-4 w-4 items-center justify-center rounded-full text-zinc-500 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500",onMouseEnter:()=>r(!0),onMouseLeave:()=>r(!1),onFocus:()=>r(!0),onBlur:()=>r(!1),children:e.jsx(ce,{className:"h-3.5 w-3.5","aria-hidden":!0})}),y&&e.jsx("span",{id:o,role:"tooltip",className:`absolute z-50 w-64 rounded border border-zinc-700 bg-zinc-900 px-3 py-2 text-xs font-normal leading-relaxed text-zinc-200 shadow-lg ${t==="top"?"bottom-full mb-2":"top-full mt-2"} left-1/2 -translate-x-1/2 whitespace-normal`,children:a})]})}const oe="https://commoncrawl.org/web-graphs";function w(a){return a==null?"—":a>=1e12?`${(a/1e12).toFixed(1)} TB`:a>=1e9?`${(a/1e9).toFixed(1)} GB`:a>=1e6?`${(a/1e6).toFixed(1)} MB`:a>=1e3?`${(a/1e3).toFixed(1)} KB`:`${a} B`}function b(a){if(!a)return"—";const m=Date.now()-new Date(a).getTime(),t=Math.floor(m/6e4);if(t<1)return"just now";if(t<60)return`${t}m ago`;const d=Math.floor(t/60);return d<24?`${d}h ago`:`${Math.floor(d/24)}d ago`}function v(a){switch(a){case"ready":return"positive";case"failed":return"negative";case"downloading":case"querying":case"queued":return"caution"}}function fe(){const[a,m]=n.useState(null),[t,d]=n.useState(null),[o,y]=n.useState([]),[r,M]=n.useState([]),[u,P]=n.useState(null),[D,C]=n.useState(!0),[S,B]=n.useState(!1),[p,R]=n.useState(!1),[N,g]=n.useState(""),[F,k]=n.useState(!1),[q,i]=n.useState(null),[I,x]=n.useState(null),j=n.useCallback(async()=>{C(!0),i(null);try{const[s,c,_,W,H]=await Promise.all([O(),V().catch(()=>null),U().catch(()=>[]),Y().catch(()=>[]),K().catch(()=>null)]);m(s),d(c),y(_),M(W),P(H)}catch(s){i(s instanceof Error?s.message:"Failed to load backlinks status")}finally{C(!1)}},[]);n.useEffect(()=>{j()},[j]);async function $(){B(!0),i(null),x(null);try{const s=await X();x(s.alreadyPresent?`DuckDB already installed (${s.version}).`:`Installed DuckDB ${s.version}.`),await j()}catch(s){i(s instanceof Error?s.message:"Failed to install DuckDB")}finally{B(!1)}}async function T(){const s=N.trim()||void 0;R(!0),i(null),x(null);try{const c=await Z(s);x(s?`Queued sync for ${c.release}. Download + query runs in the background.`:`Queued sync for auto-discovered release ${c.release}. Download + query runs in the background.`),g(""),k(!1),await j()}catch(c){c instanceof ee&&c.code==="MISSING_DEPENDENCY"?i("DuckDB is not installed. Install it first."):i(c instanceof Error?c.message:"Failed to trigger sync")}finally{R(!1)}}async function Q(s){i(null),x(null);try{await se(s),x(`Pruned cached release ${s}.`),await j()}catch(c){i(c instanceof Error?c.message:"Failed to prune release")}}const G=t?.status==="ready"&&r.every(s=>s.release!==t.release);return e.jsxs("div",{className:"page-container",children:[e.jsx("div",{className:"page-header",children:e.jsxs("div",{className:"page-header-left",children:[e.jsx("h1",{className:"page-title",children:"Backlinks"}),e.jsx("p",{className:"page-subtitle",children:"Find domains that link to your projects, computed from the open Common Crawl web graph. Runs entirely on your machine — nothing is sent to third parties."})]})}),e.jsx(l,{className:"surface-card p-4 mb-6 border-amber-800/60",children:e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(L,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-sm text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Heads up — a release sync is a large download."}),e.jsxs("ul",{className:"mt-1.5 space-y-1 text-zinc-400",children:[e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"~16 GB"})," of gzipped vertex + edge files per release, stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),"."]}),e.jsxs("li",{children:[e.jsx("span",{className:"text-zinc-200",children:"10–20 min on a fast connection"})," for the download, then ~5 min for the DuckDB query."]}),e.jsx("li",{children:"One sync covers every project in this workspace. Releases are immutable, so the download only happens once per release."})]})]})]})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"About"}),e.jsx("h2",{children:"How it works"})]})}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-sm text-zinc-400 leading-relaxed max-w-3xl mb-4",children:["Common Crawl publishes a quarterly snapshot of the public web’s hyperlink graph. Canonry downloads one"," ",e.jsx("span",{className:"text-zinc-200",children:"release"})," at a time and extracts backlinks for every project in this workspace in a single pass."]}),e.jsxs("ol",{className:"space-y-3 text-sm text-zinc-400 max-w-3xl",children:[e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"1"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Download (one-time, ~16 GB)"})," — vertex + edge files cached to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),". Runs once per release; subsequent operations reuse the cache."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"2"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Query (~5 min)"})," — one DuckDB pass scans the cached files and extracts referring domains for every project’s canonical domain. DuckDB is only used to ",e.jsx("span",{className:"text-zinc-200",children:"read"})," these dumps; it doesn’t store any canonry state."]})]}),e.jsxs("li",{className:"flex gap-3",children:[e.jsx("span",{className:"shrink-0 inline-flex h-6 w-6 items-center justify-center rounded-full border border-zinc-700 bg-zinc-900 text-xs font-semibold text-zinc-300 tabular-nums",children:"3"}),e.jsxs("span",{children:[e.jsx("span",{className:"text-zinc-200 font-medium",children:"Persist"})," — results land in the same SQLite database the rest of canonry uses. After the first sync, per-project reads (and re-run extracts against the cached release) are instant."]})]})]})]})]}),q&&e.jsx(l,{className:"surface-card p-4 mb-4 border-rose-800/60",children:e.jsx("p",{className:"text-sm text-rose-300",children:q})}),I&&e.jsx(l,{className:"surface-card p-4 mb-4 border-emerald-800/60",children:e.jsx("p",{className:"text-sm text-emerald-300",children:I})}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Dependency"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["DuckDB install status",e.jsxs(h,{label:"Why DuckDB?",children:[e.jsx("span",{className:"block",children:"DuckDB is a query engine canonry uses to scan the ~16 GB Common Crawl dumps and pull out your referring domains."}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["It does ",e.jsx("span",{className:"text-zinc-200",children:"not"})," store any canonry data — your backlink results live in SQLite alongside the rest of your projects. DuckDB is purely a tool for processing the raw CSV files."]}),e.jsxs("span",{className:"mt-2 block text-zinc-500",children:["Installed on demand (not bundled) into ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/plugins/"})," so users who never run backlinks don’t pay the ~40 MB install cost."]})]})]})]}),a?.duckdbInstalled?e.jsx(f,{tone:"positive",children:"Installed"}):e.jsx(f,{tone:"caution",children:"Not installed"})]}),e.jsx(l,{className:"surface-card p-5",children:D?e.jsx("p",{className:"text-sm text-zinc-500",children:"Checking…"}):a?.duckdbInstalled?e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(ae,{className:"h-5 w-5 text-emerald-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-sm text-zinc-200",children:["Version ",a.duckdbVersion??"unknown"," installed at"," ",e.jsx("code",{className:"text-zinc-300",children:a.pluginDir})]}),e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Required spec: ",a.duckdbSpec]})]})]}):e.jsxs("div",{className:"flex items-start gap-3",children:[e.jsx(re,{className:"h-5 w-5 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"flex-1",children:[e.jsx("p",{className:"text-sm text-zinc-200",children:"DuckDB is not installed. It’s the query engine canonry uses to scan Common Crawl dumps — required before you can run a release sync or per-project extract."}),e.jsx("p",{className:"text-xs text-zinc-500 mt-1",children:"Installing doesn’t touch your project data. DuckDB only reads the downloaded CSV files; backlink results are written to the same SQLite database canonry already uses."}),a&&e.jsxs("p",{className:"text-xs text-zinc-500 mt-1",children:["Will be installed into ",e.jsx("code",{className:"text-zinc-300",children:a.pluginDir})," (~40 MB)."]}),e.jsx("div",{className:"mt-3",children:e.jsxs(z,{type:"button",size:"sm",disabled:S,onClick:A($),children:[e.jsx(te,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),S?"Installing…":"Install DuckDB"]})})]})]})})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsxs("div",{className:"section-head section-head-inline",children:[e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Latest sync"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Release sync",e.jsx(h,{label:"What is a release sync?",children:"A release sync downloads one Common Crawl dump (~16 GB) and extracts backlinks for every project in this workspace in one pass. This is the heavy job — subsequent per-project re-runs skip the download and just re-query the cached files."})]})]}),t&&e.jsx(f,{tone:v(t.status),children:t.status})]}),e.jsxs(l,{className:"surface-card p-5",children:[e.jsxs("p",{className:"text-xs text-zinc-500 max-w-3xl mb-4",children:["A release is one Common Crawl dump (e.g. ",e.jsx("code",{className:"text-zinc-400",children:"cc-main-2026-jan-feb-mar"}),"). Syncing it downloads the graph and populates backlinks for every project in this workspace."]}),t?e.jsxs("div",{className:"space-y-2 text-sm",children:[e.jsxs("p",{className:"text-zinc-200",children:["Release ",e.jsx("code",{className:"text-zinc-300",children:t.release})]}),t.phaseDetail&&e.jsx("p",{className:"text-zinc-500",children:t.phaseDetail}),e.jsxs("div",{className:"grid grid-cols-2 md:grid-cols-4 gap-4 text-xs text-zinc-500 pt-2",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Projects"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:t.projectsProcessed??"—"})]}),e.jsxs("div",{children:[e.jsxs("p",{className:"text-zinc-600 uppercase tracking-wide flex items-center gap-1",children:["Rows",e.jsx(h,{label:"What are rows?",children:"Total number of (project, referring domain) pairs persisted in SQLite from this sync, across every project in the workspace."})]}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:t.domainsDiscovered??"—"})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Started"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(t.downloadStartedAt??t.createdAt)})]}),e.jsxs("div",{children:[e.jsx("p",{className:"text-zinc-600 uppercase tracking-wide",children:"Finished"}),e.jsx("p",{className:"text-zinc-300 mt-0.5",children:b(t.queryFinishedAt)})]})]}),t.error&&e.jsx("p",{className:"text-sm text-rose-400 pt-2",children:t.error})]}):e.jsx("p",{className:"text-sm text-zinc-500",children:"No release sync has run in this workspace yet."}),G&&e.jsx("div",{className:"mt-4 rounded border border-amber-800/60 bg-amber-950/20 p-3",children:e.jsxs("div",{className:"flex items-start gap-2",children:[e.jsx(L,{className:"h-4 w-4 text-amber-400 shrink-0 mt-0.5","aria-hidden":!0}),e.jsxs("div",{className:"text-xs text-zinc-300 leading-relaxed",children:[e.jsx("p",{className:"font-medium text-amber-200",children:"Cached files for this release are missing."}),e.jsxs("p",{className:"mt-1 text-zinc-400",children:["The sync record in the database says this release finished successfully, but the ~16 GB dump at"," ",e.jsxs("code",{className:"text-zinc-300",children:["~/.canonry/cache/commoncrawl/",t?.release,"/"]})," isn’t on disk. Your backlink data is still intact (it lives in SQLite), but per-project re-run extracts will fail until you either re-sync this release or start a new one."]})]})]})}),e.jsxs("div",{className:"mt-4 rounded border border-zinc-800 bg-zinc-900/40 p-3",children:[e.jsxs("div",{className:"flex items-start justify-between gap-3 mb-3",children:[e.jsxs("div",{children:[e.jsx("p",{className:"text-[10px] uppercase tracking-wide text-zinc-500",children:"Auto-detected release"}),u?e.jsxs("p",{className:"text-sm text-zinc-200 mt-0.5",children:[e.jsx("code",{className:"text-zinc-100",children:u.release}),e.jsxs("span",{className:"ml-2 text-xs text-zinc-500",children:["— vertex ",w(u.vertexBytes),", edges ",w(u.edgesBytes)]})]}):e.jsx("p",{className:"text-sm text-zinc-500 mt-0.5",children:D?"Probing Common Crawl…":"Could not auto-detect — pass an explicit release below."}),e.jsxs("a",{href:oe,target:"_blank",rel:"noopener noreferrer",className:"mt-1 inline-flex items-center gap-1 text-xs text-zinc-400 hover:text-zinc-200 focus:text-zinc-200 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",children:["Browse all Common Crawl web-graph releases",e.jsx(de,{className:"h-3 w-3","aria-hidden":!0})]})]}),e.jsxs("div",{className:"flex items-center gap-2 shrink-0",children:[e.jsxs(z,{type:"button",size:"sm",disabled:p||!a?.duckdbInstalled||!u&&!N.trim(),onClick:A(T),children:[e.jsx(J,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),p?"Queuing…":"Run sync"]}),e.jsxs(h,{label:"What does Run sync do?",children:[e.jsxs("span",{className:"block",children:["Downloads the auto-detected (or chosen) Common Crawl release (~16 GB) to"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/"}),", then runs a single DuckDB query that extracts referring domains for every project in this workspace."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["First time for a release: ",e.jsx("span",{className:"text-zinc-200",children:"~10–20 min download + ~5 min query"}),". Re-running the same release later: ",e.jsx("span",{className:"text-zinc-200",children:"skips download, just re-queries"})," (~5 min)."]})]})]})]}),F?e.jsxs("div",{className:"flex flex-wrap items-center gap-2",children:[e.jsx("input",{type:"text",className:"flex-1 min-w-[240px] rounded border border-zinc-700 bg-transparent px-2.5 py-1.5 text-sm text-zinc-200 placeholder-zinc-600 focus:border-zinc-500 focus:outline-none",placeholder:"cc-main-2026-jan-feb-mar",value:N,onChange:s=>g(s.target.value),disabled:p,autoFocus:!0}),e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>{g(""),k(!1)},disabled:p,children:"Cancel"})]}):e.jsx("button",{type:"button",className:"text-xs text-zinc-500 hover:text-zinc-300 focus:text-zinc-300 focus:outline-none focus-visible:ring-1 focus-visible:ring-zinc-500 rounded",onClick:()=>k(!0),disabled:p,children:"Use a different release →"})]}),!a?.duckdbInstalled&&e.jsx("p",{className:"text-xs text-zinc-600 mt-2",children:"Install DuckDB first to enable sync."})]})]}),e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"Cached releases"}),e.jsxs("h2",{className:"flex items-center gap-2",children:["Local disk cache",e.jsxs(h,{label:"What is this?",children:[e.jsxs("span",{className:"block",children:["Raw Common Crawl dumps stored at"," ",e.jsx("code",{className:"text-zinc-300",children:"~/.canonry/cache/commoncrawl/<release>/"}),". Each release takes ~16 GB."]}),e.jsxs("span",{className:"mt-2 block text-zinc-400",children:["These files are needed to re-run per-project extracts against a release without re-downloading. Pruning here ",e.jsx("span",{className:"text-zinc-200",children:"does not delete your backlink data"})," — that lives in SQLite."]})]})]})]})}),e.jsx("p",{className:"text-xs text-zinc-500 mb-3 max-w-3xl",children:"Each cached release is a ~16 GB pair of gzipped files. They’re needed to re-query the graph (e.g. for a newly-added project) without re-downloading. Safe to prune — backlink results persist in SQLite."}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Sync status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Size"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Last used"}),e.jsx("th",{className:"px-4 py-2 font-medium sr-only",children:"Actions"})]})}),e.jsxs("tbody",{children:[r.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:s.syncStatus?e.jsx(f,{tone:v(s.syncStatus),children:s.syncStatus}):e.jsx("span",{className:"text-zinc-600",children:"—"})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:w(s.bytes)}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.lastUsedAt)}),e.jsx("td",{className:"px-4 py-2 text-right",children:e.jsxs("div",{className:"inline-flex items-center gap-1",children:[e.jsxs(z,{type:"button",variant:"outline",size:"sm",onClick:()=>{Q(s.release)},children:[e.jsx(ne,{className:"h-4 w-4 mr-1.5","aria-hidden":!0}),"Prune"]}),e.jsx(h,{label:"What does Prune do?",placement:"top",children:"Deletes the ~16 GB cache for this release from disk. Backlink results already in SQLite remain untouched. To re-run extracts against this release, you’d have to sync it again (another ~16 GB download)."})]})})]},s.release)),r.length===0&&e.jsx("tr",{children:e.jsx("td",{className:"px-4 py-4 text-sm text-zinc-500",colSpan:5,children:"No cached releases on this machine. If you ran a sync from a different machine (or deleted the cache), the backlink data is still in the database — but you’ll need to re-sync a release to run new extracts."})})]})]})})]}),o.length>1&&e.jsxs("section",{className:"page-section-divider",children:[e.jsx("div",{className:"section-head section-head-inline",children:e.jsxs("div",{children:[e.jsx("p",{className:"eyebrow eyebrow-soft",children:"History"}),e.jsx("h2",{children:"Past release syncs"})]})}),e.jsx(l,{className:"surface-card overflow-hidden",children:e.jsxs("table",{className:"w-full text-sm",children:[e.jsx("thead",{children:e.jsxs("tr",{className:"border-b border-zinc-800 text-left text-xs uppercase tracking-wide text-zinc-600",children:[e.jsx("th",{className:"px-4 py-2 font-medium",children:"Release"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Status"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Projects"}),e.jsx("th",{className:"px-4 py-2 text-right font-medium",children:"Rows"}),e.jsx("th",{className:"px-4 py-2 font-medium",children:"Finished"})]})}),e.jsx("tbody",{children:o.map(s=>e.jsxs("tr",{className:"border-b border-zinc-900 last:border-0",children:[e.jsx("td",{className:"px-4 py-2 text-zinc-200",children:e.jsx("code",{children:s.release})}),e.jsx("td",{className:"px-4 py-2",children:e.jsx(f,{tone:v(s.status),children:s.status})}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.projectsProcessed??"—"}),e.jsx("td",{className:"px-4 py-2 text-right text-zinc-400 tabular-nums",children:s.domainsDiscovered??"—"}),e.jsx("td",{className:"px-4 py-2 text-zinc-400",children:b(s.queryFinishedAt??s.updatedAt)})]},s.id))})]})})]})]})}export{fe as BacklinksPage};
@@ -1 +1 @@
1
- import{c as n}from"./index-BVdH2O9w.js";const r=[["path",{d:"m15 18-6-6 6-6",key:"1wnfg3"}]],s=n("chevron-left",r),f={contentStyle:{backgroundColor:"#18181b",border:"1px solid #3f3f46",borderRadius:8,fontSize:12},labelStyle:{color:"#e4e4e7"},itemStyle:{color:"#a1a1aa"}},d={fill:"#71717a",fontSize:11},l="#27272a",S="#27272a",a=["#34d399","#60a5fa","#f472b6","#facc15","#a78bfa","#fb923c","#22d3ee","#f87171"],c={gemini:"#60a5fa",openai:"#34d399",claude:"#fb923c",perplexity:"#22d3ee",local:"#a78bfa"};function R(t,e=0){return c[t]??a[e%a.length]}const T={text:"#a1a1aa",textDim:"#71717a",textFaint:"#52525b",surface:"#27272a"},u={positive:"#34d399",positiveDeep:"#10b981"};function o(t){const e=String(t);return e.includes("T")?new Date(e):new Date(e+"T00:00:00")}function C(t){return o(String(t)).toLocaleDateString(void 0,{month:"short",day:"numeric",year:"numeric"})}function _(t){const e=o(t);return`${e.getMonth()+1}/${e.getDate()}`}export{S as C,d as a,C as b,f as c,T as d,a as e,_ as f,u as g,s as h,l as i,R as p};
1
+ import{c as n}from"./index-DHg9_-PB.js";const r=[["path",{d:"m15 18-6-6 6-6",key:"1wnfg3"}]],s=n("chevron-left",r),f={contentStyle:{backgroundColor:"#18181b",border:"1px solid #3f3f46",borderRadius:8,fontSize:12},labelStyle:{color:"#e4e4e7"},itemStyle:{color:"#a1a1aa"}},d={fill:"#71717a",fontSize:11},l="#27272a",S="#27272a",a=["#34d399","#60a5fa","#f472b6","#facc15","#a78bfa","#fb923c","#22d3ee","#f87171"],c={gemini:"#60a5fa",openai:"#34d399",claude:"#fb923c",perplexity:"#22d3ee",local:"#a78bfa"};function R(t,e=0){return c[t]??a[e%a.length]}const T={text:"#a1a1aa",textDim:"#71717a",textFaint:"#52525b",surface:"#27272a"},u={positive:"#34d399",positiveDeep:"#10b981"};function o(t){const e=String(t);return e.includes("T")?new Date(e):new Date(e+"T00:00:00")}function C(t){return o(String(t)).toLocaleDateString(void 0,{month:"short",day:"numeric",year:"numeric"})}function _(t){const e=o(t);return`${e.getMonth()+1}/${e.getDate()}`}export{S as C,d as a,C as b,f as c,T as d,a as e,_ as f,u as g,s as h,l as i,R as p};