@gscdump/analysis 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +251 -0
- package/dist/analyzer/index.d.mts +893 -0
- package/dist/analyzer/index.mjs +4944 -0
- package/dist/default-registry.d.mts +93 -0
- package/dist/default-registry.mjs +1957 -0
- package/dist/index.d.mts +620 -0
- package/dist/index.mjs +2873 -0
- package/dist/period/index.d.mts +57 -0
- package/dist/period/index.mjs +150 -0
- package/dist/query/index.d.mts +26 -0
- package/dist/query/index.mjs +340 -0
- package/dist/semantic/index.d.mts +70 -0
- package/dist/semantic/index.mjs +391 -0
- package/dist/source/index.d.mts +427 -0
- package/dist/source/index.mjs +1865 -0
- package/package.json +86 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Harlan Wilton
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,251 @@
|
|
|
1
|
+
# @gscdump/analysis
|
|
2
|
+
|
|
3
|
+
[](https://npmjs.com/package/@gscdump/analysis)
|
|
4
|
+
[](https://npm.chart.dev/@gscdump/analysis)
|
|
5
|
+
[](https://github.com/harlan-zw/gscdump/blob/main/LICENSE)
|
|
6
|
+
|
|
7
|
+
> SEO analyzers + typed query primitives for Google Search Console data. Row-based, DuckDB-native, D1-ready.
|
|
8
|
+
|
|
9
|
+
## Install
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
npm install @gscdump/analysis
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## When to use which subpath
|
|
16
|
+
|
|
17
|
+
| Subpath | Use when |
|
|
18
|
+
|---|---|
|
|
19
|
+
| `@gscdump/analysis` | You have arrays of rows (GSC API responses, D1 query results). Pure functions, no DB. |
|
|
20
|
+
| `@gscdump/analysis/analyzer` | Analyzer contracts (`Analyzer`, `Plan`, `FileSet`), `ROW_ANALYZERS`, registry, dispatcher. |
|
|
21
|
+
| `@gscdump/engine-duckdb-node` | You have a Node DuckDB handle over parquet. Ships `SQL_ANALYZERS` + `analyzeInBrowser` for attached-table parquet. |
|
|
22
|
+
| `@gscdump/engine-wasm` | Nuxt / React / vanilla app running DuckDB-WASM client-side against R2 parquets. |
|
|
23
|
+
| `@gscdump/engine-sqlite` | Cloudflare Workers / anywhere routing through sqlite-proxy (D1). |
|
|
24
|
+
| `@gscdump/engine/resolver` | Dialect-neutral SQL composition kit: `ResolverAdapter`, `pgResolverAdapter`, `compilePg`/`compileSqlite`, `resolveToSQL`, source contracts. |
|
|
25
|
+
| `@gscdump/analysis/source` | Portable query sources + source-backed analyzers shared across GSC API, sqlite, DuckDB, and tests. |
|
|
26
|
+
| `@gscdump/analysis/semantic` | Browser-only semantic analyzers such as content-gap; optional `@huggingface/transformers` peer. |
|
|
27
|
+
| `@gscdump/analysis/query` | `buildDataQueryPlan` / `buildDataDetailPlan` for the generic query analyzers. |
|
|
28
|
+
|
|
29
|
+
## Row-based analyzers
|
|
30
|
+
|
|
31
|
+
Pure functions. Take typed arrays in, return typed results out.
|
|
32
|
+
|
|
33
|
+
```ts
|
|
34
|
+
import {
|
|
35
|
+
analyzeBrandSegmentation,
|
|
36
|
+
analyzeClustering,
|
|
37
|
+
analyzeConcentration,
|
|
38
|
+
analyzeDecay,
|
|
39
|
+
analyzeMovers,
|
|
40
|
+
analyzeOpportunity,
|
|
41
|
+
analyzeSeasonality,
|
|
42
|
+
analyzeStrikingDistance,
|
|
43
|
+
padTimeseries,
|
|
44
|
+
} from '@gscdump/analysis'
|
|
45
|
+
|
|
46
|
+
const striking = analyzeStrikingDistance(keywordRows, { minImpressions: 100 })
|
|
47
|
+
const movers = analyzeMovers(currentRows, previousRows)
|
|
48
|
+
const decay = analyzeDecay(currentRows, previousRows)
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Meta-analysis helpers are available too:
|
|
52
|
+
|
|
53
|
+
```ts
|
|
54
|
+
import { analyzeActionPriority } from '@gscdump/analysis'
|
|
55
|
+
|
|
56
|
+
const prioritized = await analyzeActionPriority({
|
|
57
|
+
analyze: params => runner.analyze(params),
|
|
58
|
+
})
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Source adapters compose a GSC client + analyzer in one call:
|
|
62
|
+
|
|
63
|
+
```ts
|
|
64
|
+
import {
|
|
65
|
+
analyzeMoversFromSource,
|
|
66
|
+
analyzeStrikingDistanceFromSource,
|
|
67
|
+
createGscApiQuerySource,
|
|
68
|
+
} from '@gscdump/analysis'
|
|
69
|
+
|
|
70
|
+
const source = createGscApiQuerySource({ client, siteUrl })
|
|
71
|
+
const movers = await analyzeMoversFromSource(source, { current, previous })
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## DuckDB (Node)
|
|
75
|
+
|
|
76
|
+
SQL-native path. `SQL_ANALYZERS` dispatch through `runAnalyzerFromSource` against an engine-backed source.
|
|
77
|
+
|
|
78
|
+
```ts
|
|
79
|
+
import { createAnalyzerRegistry, ROW_ANALYZERS, runAnalyzerFromSource } from '@gscdump/analysis/analyzer'
|
|
80
|
+
import { createEngine, SQL_ANALYZERS } from '@gscdump/engine-duckdb-node'
|
|
81
|
+
|
|
82
|
+
const source = createEngine({ engine, ctx })
|
|
83
|
+
const registry = createAnalyzerRegistry({ rows: ROW_ANALYZERS, sql: SQL_ANALYZERS })
|
|
84
|
+
const result = await runAnalyzerFromSource(source, { type: 'striking-distance', minImpressions: 100 }, registry)
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
`attachParquetIndex` and `attachSnapshotIndex` wire parquet files (per-day, per-month, or pre-baked `.duckdb` snapshots) into a DuckDB session for `analyzeInBrowser` (attached-table path).
|
|
88
|
+
|
|
89
|
+
## Browser (DuckDB-WASM)
|
|
90
|
+
|
|
91
|
+
Three primitives for client-side analytics:
|
|
92
|
+
|
|
93
|
+
```ts
|
|
94
|
+
import {
|
|
95
|
+
createInsightRunner,
|
|
96
|
+
resolveWindow,
|
|
97
|
+
scopeFor,
|
|
98
|
+
strikingMomentum,
|
|
99
|
+
} from '@gscdump/engine-wasm'
|
|
100
|
+
|
|
101
|
+
const runner = createInsightRunner({ db, conn }) // AsyncDuckDB + connection
|
|
102
|
+
const window = resolveWindow({ preset: 'last-30d', comparison: 'prev-period' })
|
|
103
|
+
const scope = scopeFor('pages', { siteId, window })
|
|
104
|
+
|
|
105
|
+
const rows = await strikingMomentum(runner, { ...scope, limit: 50 })
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
- `createInsightRunner({ db, conn })` — drizzle-orm handle over DuckDB-WASM. Typed `.select()` / window functions, or drop to `sql\`...\`` raw.
|
|
109
|
+
- `bootDuckDBWasm()` / `attachParquetUrlTables()` / `attachParquetTables()` / `createBrowserAnalysisRuntime()` — reusable browser runtime primitives for booting DuckDB-WASM, attaching parquet-backed views, and exposing `query()` / `analyze()` helpers without rewriting the same glue in every app.
|
|
110
|
+
- `resolveWindow({ preset, comparison, anchor })` — canonical date windows, no DB.
|
|
111
|
+
- `scopeFor(table, { siteId, window })` + `mergeScope()` — boundary predicates for multi-tenant queries.
|
|
112
|
+
- `schema`, `pages`, `keywords`, `page_keywords`, `countries`, `devices` — drizzle schema mirroring `gscdump/analytics` `SCHEMAS`. Drift fails loudly at load.
|
|
113
|
+
|
|
114
|
+
Vendors a stripped-down drizzle-orm DuckDB-WASM adapter (~240 LoC, adapted from `@proj-airi/drizzle-duckdb-wasm`, MIT). Transactions throw — analytics workload is read-only.
|
|
115
|
+
|
|
116
|
+
`@duckdb/duckdb-wasm` is an optional peer dep. Bundle: **10.3 kB / 2.72 kB gzipped**.
|
|
117
|
+
|
|
118
|
+
## SQLite (D1 / Cloudflare Workers)
|
|
119
|
+
|
|
120
|
+
Mirror of `/browser`, dialect-targeted at sqlite-core.
|
|
121
|
+
|
|
122
|
+
```ts
|
|
123
|
+
import {
|
|
124
|
+
aggClicks,
|
|
125
|
+
aggCtr,
|
|
126
|
+
aggImpressions,
|
|
127
|
+
aggPosition,
|
|
128
|
+
compileSqlite,
|
|
129
|
+
createSqliteInsightRunner,
|
|
130
|
+
gsc_keywords,
|
|
131
|
+
sql,
|
|
132
|
+
} from '@gscdump/engine-sqlite'
|
|
133
|
+
|
|
134
|
+
const queryExpr = sql`
|
|
135
|
+
SELECT ${gsc_keywords.query} as keyword,
|
|
136
|
+
SUM(${gsc_keywords.clicks}) as clicks,
|
|
137
|
+
${aggCtr(gsc_keywords)} as ctr
|
|
138
|
+
FROM ${gsc_keywords}
|
|
139
|
+
WHERE ${gsc_keywords.site_id} = ${siteId}
|
|
140
|
+
GROUP BY ${gsc_keywords.query}
|
|
141
|
+
`
|
|
142
|
+
const { sql: compiledSql, params } = compileSqlite(queryExpr)
|
|
143
|
+
const rows = await executor(compiledSql, params) // queryUserD1, etc.
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
- `createSqliteInsightRunner({ executor })` — sqlite-proxy drizzle adapter.
|
|
147
|
+
- `compileSqlite(sql)` — compile to `{ sql, params }` for any HTTP executor.
|
|
148
|
+
- `aggClicks` / `aggImpressions` / `aggCtr` / `aggPosition` — aggregate helpers replacing hand-rolled `METRICS_SQL`.
|
|
149
|
+
- Runtime builder exports (`colRef`, `dimColumn`, `metricSql`, `havingPredicates`, …) for dimension/metric driven query builders.
|
|
150
|
+
|
|
151
|
+
Always import `sql` from `@gscdump/engine-sqlite` — not `drizzle-orm` directly — so consumers bind to the package's drizzle-orm instance and avoid cross-install `SQL<unknown>` mismatches.
|
|
152
|
+
|
|
153
|
+
Bundle: **5.3 kB / 1.4 kB gzipped**.
|
|
154
|
+
|
|
155
|
+
## Query composers (dialect-neutral)
|
|
156
|
+
|
|
157
|
+
```ts
|
|
158
|
+
import { sqliteResolverAdapter } from '@gscdump/engine-sqlite'
|
|
159
|
+
import {
|
|
160
|
+
buildTotalsSql,
|
|
161
|
+
pgResolverAdapter,
|
|
162
|
+
resolveToSQL,
|
|
163
|
+
resolveToSQLOptimized,
|
|
164
|
+
} from '@gscdump/engine/resolver'
|
|
165
|
+
|
|
166
|
+
const resolved = resolveToSQL(builderState, { adapter: sqliteResolverAdapter, siteId })
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
Pass `sqliteResolverAdapter` from `@gscdump/engine-sqlite` (D1, `site_id`-scoped) or `pgResolverAdapter` from `@gscdump/engine/resolver` (parquet via DuckDB, single tenant). Composers stay identical; only the column bindings + dialect compilation differ.
|
|
170
|
+
|
|
171
|
+
## Sources (portable)
|
|
172
|
+
|
|
173
|
+
`/source` is the cross-implementation seam:
|
|
174
|
+
|
|
175
|
+
```ts
|
|
176
|
+
import {
|
|
177
|
+
analyzeMoversFromSource,
|
|
178
|
+
createEngineQuerySource,
|
|
179
|
+
queryRows,
|
|
180
|
+
} from '@gscdump/analysis/source'
|
|
181
|
+
|
|
182
|
+
const source = createEngineQuerySource({
|
|
183
|
+
engine,
|
|
184
|
+
ctx: { userId, siteId },
|
|
185
|
+
})
|
|
186
|
+
|
|
187
|
+
const rows = await queryRows(source, builderState)
|
|
188
|
+
const movers = await analyzeMoversFromSource(source, periods)
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
Available source factories:
|
|
192
|
+
|
|
193
|
+
- `createGscApiQuerySource({ client, siteUrl })`
|
|
194
|
+
- `createBrowserQuerySource({ query(sql, params) })`
|
|
195
|
+
- `createSqliteQuerySource({ executor, siteId })`
|
|
196
|
+
- `createEngineQuerySource({ engine, ctx })`
|
|
197
|
+
- `createInMemoryQuerySource({ queryRows })`
|
|
198
|
+
|
|
199
|
+
Portable analyzers currently cover the row-based tools:
|
|
200
|
+
`striking-distance`, `opportunity`, `brand`, `clustering`, `concentration`,
|
|
201
|
+
`seasonality`, `movers`, and `decay`.
|
|
202
|
+
|
|
203
|
+
## Semantic (browser-only)
|
|
204
|
+
|
|
205
|
+
`/semantic` holds browser-only analysis that depends on client runtime
|
|
206
|
+
capabilities rather than SQL backends alone.
|
|
207
|
+
|
|
208
|
+
```ts
|
|
209
|
+
import { analyzeContentGap } from '@gscdump/analysis/semantic'
|
|
210
|
+
|
|
211
|
+
const result = await analyzeContentGap(runner, {
|
|
212
|
+
maxQueries: 1500,
|
|
213
|
+
minDivergence: 0.12,
|
|
214
|
+
})
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
`analyzeContentGap()` loads a MiniLM/BGE embedding model via
|
|
218
|
+
`@huggingface/transformers`, caches vectors in IndexedDB, and compares top
|
|
219
|
+
queries to candidate URLs derived from `page_keywords`.
|
|
220
|
+
|
|
221
|
+
## Window resolution
|
|
222
|
+
|
|
223
|
+
```ts
|
|
224
|
+
import { resolveWindow } from '@gscdump/engine-wasm'
|
|
225
|
+
|
|
226
|
+
const w = resolveWindow({ preset: 'last-30d', comparison: 'yoy' })
|
|
227
|
+
// { start: '...', end: '...', days: 30, comparison: { start, end } }
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
Presets: `last-7d`, `last-28d`, `last-30d`, `last-90d`, `last-180d`, `last-365d`, `mtd`, `ytd`, `custom`. Comparison modes: `none`, `prev-period`, `yoy`.
|
|
231
|
+
|
|
232
|
+
## Stability
|
|
233
|
+
|
|
234
|
+
| Surface | Stability |
|
|
235
|
+
|---|---|
|
|
236
|
+
| Row analyzers (`analyzeStrikingDistance`, `analyzeMovers`, ...) | Public |
|
|
237
|
+
| Source factories (`create*QuerySource`) + `analyzeFromSource` | Public |
|
|
238
|
+
| Engine factories (`@gscdump/analysis/engine/<name>`'s `createEngine`) | Public |
|
|
239
|
+
| `Analyzer<P, R>` contract + `analyzerRegistry` | Public |
|
|
240
|
+
| `/period`, `/query`, `/source`, `/semantic` subpaths | Public |
|
|
241
|
+
| Internals reached through `@gscdump/analysis/engine/<name>/<file>` not listed above | Private |
|
|
242
|
+
|
|
243
|
+
## Related
|
|
244
|
+
|
|
245
|
+
- [`gscdump`](../gscdump) — REST client + query builder (edge-safe).
|
|
246
|
+
- [`@gscdump/engine`](../engine) — Parquet/DuckDB storage engine.
|
|
247
|
+
- [`@gscdump/cli`](../cli) — CLI wrapping `gscdump` + `@gscdump/engine` + `@gscdump/analysis`.
|
|
248
|
+
|
|
249
|
+
## License
|
|
250
|
+
|
|
251
|
+
[MIT](../../LICENSE)
|