soup-chop 1.0.3 → 1.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +80 -36
  2. package/dist/index.js +1718 -340
  3. package/package.json +3 -1
package/README.md CHANGED
@@ -242,6 +242,21 @@ Target selection is explicit:
242
242
  `soupChop` does **not** merge local and published corpora implicitly. When you
243
243
  select a local target, the server stays on that filesystem corpus only.
244
244
 
245
+ Source discovery is broader than package-shipped markdown alone:
246
+
247
+ - `README.md`
248
+ - package-local markdown under `doc/` and `docs/`
249
+ - a small allowlist of top-level markdown docs such as `API.md`, `FAQ.md`, `MIGRATING.md`, `UPGRADING.md`, `CHANGELOG.md`, and `CONTRIBUTING.md`
250
+ - synthetic `source_jsdoc` entries generated from exported TypeScript JSDoc/TSDoc discovered from local packages, published npm tarballs, and GitHub/GitLab repositories
251
+ - same-origin website docs crawled from an explicit `docs_url`, or auto-detected from README links when a package is website-first
252
+ - repository-derived wiki pages indexed as `wiki_doc` when a GitHub / GitLab wiki is discoverable
253
+ - one-hop local workspace dependency docs when resolving a package through `workspace_path`
254
+
255
+ Website pages are indexed as `website_doc` sources and use `origin` values such as `website/dockview.dev/docs/core/overview`.
256
+ Synthetic JSDoc pages use origins like `jsdoc/createstableclient`. Wiki pages use `wiki/...` origins.
257
+
258
+ For website ingestion, soup-chop currently stays **Node-only**: it supports static HTML crawling plus manifest-aware discovery such as VitePress `hashmap.json` and sitemap-backed doc sites, but intentionally does **not** require Playwright or a browser runtime for JS-rendered SPA fallback.
259
+
245
260
  `compare_versions` and `get_upgrade_guide` currently remain **npm-only**.
246
261
 
247
262
  ### `get_capabilities`
@@ -263,7 +278,7 @@ metadata such as source precedence, cache freshness, and the resolved local
263
278
  package root.
264
279
 
265
280
  This is the best tool to call when you want to verify which files were actually
266
- analyzed before following up with `search_docs`, `get_toc`, or the Mincer tools.
281
+ analyzed before following up with `search_docs`, `get_toc`, or the Mincer tools. For website-first packages, this can include `website_doc` entries in addition to packaged markdown.
267
282
 
268
283
  **Parameters:**
269
284
 
@@ -271,6 +286,7 @@ analyzed before following up with `search_docs`, `get_toc`, or the Mincer tools.
271
286
  | :-------- | :------- | :------- | :---------------------------------------------------------------------- |
272
287
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
273
288
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
289
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
274
290
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
275
291
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
276
292
 
@@ -289,40 +305,46 @@ analyzed before following up with `search_docs`, `get_toc`, or the Mincer tools.
289
305
 
290
306
  ```json
291
307
  {
292
- "package": "@acme/docs-pkg",
293
- "version": "1.5.0",
308
+ "package": "dockview",
309
+ "version": "5.1.0",
294
310
  "sources": [
295
- {
296
- "sourceId": "api__md",
297
- "sourceKind": "top_level_doc",
298
- "origin": "API.md",
299
- "title": "API.md",
300
- "lineCount": 12
301
- },
302
- {
303
- "sourceId": "docs__guide__md",
304
- "sourceKind": "package_doc",
305
- "origin": "docs/guide.md",
306
- "title": "guide.md",
307
- "lineCount": 18
308
- },
309
311
  {
310
312
  "sourceId": "readme__md",
311
313
  "sourceKind": "readme",
312
314
  "origin": "README.md",
313
315
  "title": "README.md",
314
- "lineCount": 42
316
+ "lineCount": 81
317
+ },
318
+ {
319
+ "sourceId": "website__dockview__dev__docs__core__overview",
320
+ "sourceKind": "website_doc",
321
+ "origin": "website/dockview.dev/docs/core/overview",
322
+ "title": "Overview | Dockview",
323
+ "lineCount": 3
324
+ },
325
+ {
326
+ "sourceId": "website__dockview__dev__docs__core__panels__register",
327
+ "sourceKind": "website_doc",
328
+ "origin": "website/dockview.dev/docs/core/panels/register",
329
+ "title": "Registering Panels | Dockview",
330
+ "lineCount": 5
315
331
  }
316
332
  ],
317
333
  "debug": {
318
334
  "precedence": "explicit_target_only",
319
- "targetMode": "local",
320
- "freshness": "mutable_local_filesystem",
321
- "cacheEnabled": false,
322
- "cacheReason": "Local filesystem targets are mutable, so corpus and search caches stay disabled.",
323
- "cachePaths": {},
324
- "packageRoot": "/home/me/dev/acme/packages/docs-pkg",
325
- "sourceOrigins": ["API.md", "docs/guide.md", "README.md"]
335
+ "targetMode": "npm",
336
+ "freshness": "immutable_published_version",
337
+ "cacheEnabled": true,
338
+ "cacheReason": "Exact published npm versions are immutable, so manifest and search caches are safe.",
339
+ "cachePaths": {
340
+ "manifest": "/home/me/.cache/soup-chop/dockview/5.1.0/sources-manifest.json",
341
+ "searchIndex": "/home/me/.cache/soup-chop/dockview/5.1.0/search-index.json"
342
+ },
343
+ "sourceOrigins": [
344
+ "README.md",
345
+ "website/dockview.dev/docs/core/overview",
346
+ "website/dockview.dev/docs/core/panels/register"
347
+ ]
326
348
  }
327
349
  }
328
350
  ```
@@ -334,20 +356,22 @@ analyzed before following up with `search_docs`, `get_toc`, or the Mincer tools.
334
356
  Returns a structured table of contents for a package's markdown docs.
335
357
 
336
358
  Without `origin`, it returns a package-wide multi-document TOC that includes
337
- synthetic file-root entries such as `README.md`, `CHANGELOG.md`, or
338
- `docs/faq.md`.
359
+ synthetic file-root entries such as `README.md`, `CHANGELOG.md`, `docs/faq.md`, or website origins such as `website/dockview.dev/docs/core/overview`.
339
360
 
340
361
  With `origin`, it returns the TOC for a single markdown document.
341
362
 
363
+ For `README.md`, soup-chop also adds direct synthetic entries for common top-level sections such as `Installation`, `API`, `Usage`, and `FAQ` so callers can jump straight to those sections without the full hierarchical path.
364
+
342
365
  **Parameters:**
343
366
 
344
367
  | Name | Type | Required | Description |
345
368
  | :-------- | :------- | :------- | :------------------------------------- |
346
369
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
347
370
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
371
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
348
372
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
349
373
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
350
- | `origin` | `string` | No | Markdown source path from `search_docs` such as `"docs/faq.md"` or `"CHANGELOG.md"` |
374
+ | `origin` | `string` | No | Source path from `search_docs` such as `"docs/faq.md"`, `"CHANGELOG.md"`, or `"website/dockview.dev/docs/core/overview"` |
351
375
 
352
376
  **Example call:**
353
377
 
@@ -424,16 +448,20 @@ document.
424
448
  You can identify the target either with `topic_id`, or with `path` plus an
425
449
  optional `origin`.
426
450
 
451
+ For website docs, `origin` and `topic_id` use the normalized website source path returned by `search_docs` and `get_toc`.
452
+ The same applies to synthetic README section shortcuts such as `README.md#Installation`.
453
+
427
454
  **Parameters:**
428
455
 
429
456
  | Name | Type | Required | Description |
430
457
  | :-------- | :------- | :------- | :------------------------------------------------------- |
431
458
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
432
459
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
460
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
433
461
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
434
462
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
435
463
  | `path` | `string` | No | Section path from `get_toc` (e.g., `"Zod/Installation"`) |
436
- | `origin` | `string` | No | Markdown source path from `search_docs` such as `"docs/faq.md"` or `"CHANGELOG.md"` |
464
+ | `origin` | `string` | No | Source path from `search_docs` such as `"docs/faq.md"`, `"CHANGELOG.md"`, or `"website/dockview.dev/docs/core/panels/register"` |
437
465
  | `topic_id`| `string` | No | Stable topic identifier from `search_docs` or aggregated `get_toc` |
438
466
 
439
467
  **Example call using `topic_id`:**
@@ -475,10 +503,11 @@ extraction scope.
475
503
  | :-------- | :------- | :------- | :---------------------------------------------------------------------- |
476
504
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
477
505
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
506
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
478
507
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
479
508
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
480
509
  | `path` | `string` | No | Optional section path from `get_toc` |
481
- | `origin` | `string` | No | Optional markdown source path such as `"docs/faq.md"` or `"CHANGELOG.md"` |
510
+ | `origin` | `string` | No | Optional source path such as `"docs/faq.md"`, `"CHANGELOG.md"`, or `"website/dockview.dev/docs/core/panels/add"` |
482
511
  | `topic_id`| `string` | No | Stable topic identifier from `search_docs` or aggregated `get_toc` |
483
512
 
484
513
  **Example call:**
@@ -510,6 +539,7 @@ the documentation sections that mention it.
510
539
  | :-------- | :------- | :------- | :---------------------------------------------------------------------- |
511
540
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
512
541
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
542
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
513
543
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
514
544
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
515
545
  | `name` | `string` | Yes | Identifier or literal symbol to locate |
@@ -535,7 +565,7 @@ the documentation sections that mention it.
535
565
 
536
566
  Searches `README.md`, package-local markdown under `doc/` and `docs/`, and a
537
567
  small allowlist of top-level markdown docs such as `API.md`, `FAQ.md`,
538
- `MIGRATING.md`, `UPGRADING.md`, `CHANGELOG.md`, and `CONTRIBUTING.md`.
568
+ `MIGRATING.md`, `UPGRADING.md`, `CHANGELOG.md`, and `CONTRIBUTING.md`. For website-first packages, soup-chop can also crawl same-origin website docs and index them as `website_doc` sources.
539
569
 
540
570
  Results are ranked lexically and return the source origin, section path, line
541
571
  ranges, a stable `topicId`, a short preview, and lightweight example metadata
@@ -553,6 +583,7 @@ tokenization and ranking.
553
583
  | :-------- | :------- | :------- | :---------------------------------------------------------------------- |
554
584
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
555
585
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
586
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
556
587
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
557
588
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
558
589
  | `query` | `string` | Yes | Keyword query or natural-language question |
@@ -608,6 +639,7 @@ than to raw document navigation.
608
639
  | :-------- | :------- | :------- | :---------------------------------------------------------------------- |
609
640
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
610
641
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
642
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
611
643
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
612
644
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
613
645
  | `query` | `string` | Yes | Keyword query or natural-language question |
@@ -661,6 +693,7 @@ range, snippet, and detector names so the caller can pivot back into
661
693
  | :-------- | :------- | :------- | :----------------------------------------------------------------------- |
662
694
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
663
695
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
696
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
664
697
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
665
698
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
666
699
  | `limit` | `number` | No | Maximum number of ranked trap findings to return, from `1` to `50`, default `10` |
@@ -732,6 +765,7 @@ broader caveat-oriented `get_traps` view.
732
765
  | :-------- | :------- | :------- | :----------------------------------------------------------------------------- |
733
766
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
734
767
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
768
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
735
769
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
736
770
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
737
771
  | `limit` | `number` | No | Maximum number of ranked deprecation findings to return, from `1` to `50`, default `10` |
@@ -799,6 +833,7 @@ general caveats.
799
833
  | :-------- | :------- | :------- | :--------------------------------------------------------------------------- |
800
834
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
801
835
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
836
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
802
837
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
803
838
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
804
839
  | `limit` | `number` | No | Maximum number of ranked constraint findings to return, from `1` to `50`, default `10` |
@@ -866,6 +901,7 @@ operational caveats.
866
901
  | :-------- | :------- | :------- | :----------------------------------------------------------------------------- |
867
902
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
868
903
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
904
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
869
905
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
870
906
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
871
907
  | `limit` | `number` | No | Maximum number of ranked performance findings to return, from `1` to `50`, default `10` |
@@ -933,6 +969,7 @@ guidance rather than general caveats or performance notes.
933
969
  | :-------- | :------- | :------- | :----------------------------------------------------------------------------- |
934
970
  | `package` | `string` | No | NPM package name, or workspace package name when using `workspace_path` |
935
971
  | `version` | `string` | No | NPM version string for remote usage, or optional exact local-version check |
972
+ | `docs_url` | `string` | No | Optional docs website entry URL; when provided, soup-chop crawls this website as an additional documentation source |
936
973
  | `local_path` | `string` | No | Local package directory or local `package.json` path |
937
974
  | `workspace_path` | `string` | No | Local monorepo / workspace root; requires `package` |
938
975
  | `limit` | `number` | No | Maximum number of ranked security findings to return, from `1` to `50`, default `10` |
@@ -1021,7 +1058,7 @@ package, showing added, removed, and unchanged sections.
1021
1058
  | Name | Type | Required | Description |
1022
1059
  | :-------- | :------- | :------- | :------------------------------------------------------------------------------------- |
1023
1060
  | `package` | `string` | Yes | NPM package name |
1024
- | `v_old` | `string` | Yes | NPM version string: exact version or published dist-tag (e.g., `"3.22.0"`, `"latest"`) |
1061
+ | `v_old` | `string` | Yes | NPM version string: exact version or published dist-tag (e.g., `"3.22.0"`, `"oldest"`) (jk) |
1025
1062
  | `v_new` | `string` | Yes | NPM version string: exact version or published dist-tag (e.g., `"3.23.8"`, `"latest"`) |
1026
1063
 
1027
1064
  ### `get_upgrade_guide`
@@ -1100,11 +1137,12 @@ You ask about express@4.18.2
1100
1137
  **Key design decisions:**
1101
1138
 
1102
1139
  - **Version-scoped caching** — Each `package@version` pair is cached independently.
1103
- Since published NPM versions are immutable, the cache never goes stale.
1140
+ Since published NPM versions are immutable, packaged markdown, website-source manifests, search indexes, and findings caches can be reused safely.
1104
1141
  - **No API keys required** — Everything runs locally. No LLM calls, no embeddings
1105
- API, no external services beyond UNPKG (a public CDN).
1142
+ API, no external services beyond UNPKG (a public CDN) and the target documentation website when `docs_url` or README-based website detection is used.
1106
1143
  - **Surgical delivery** — The AI gets a *table of contents*, not the full README.
1107
1144
  It can then request only the section it needs, keeping context windows lean.
1145
+ - **Website-first docs support** — Packages whose real docs live on a website can be indexed through bounded same-origin crawling and HTML-to-Markdown normalization.
1108
1146
 
1109
1147
  ---
1110
1148
 
@@ -1127,10 +1165,16 @@ Cache is organized as:
1127
1165
  ~/.cache/soup-chop/
1128
1166
  ├── express/
1129
1167
  │ └── 4.18.2/
1130
- └── README.md
1168
+ ├── README.md
1169
+ │ ├── sources-manifest.json
1170
+ │ ├── search-index.json
1171
+ │ └── mincer-findings.json
1131
1172
  ├── zod/
1132
1173
  │ └── 3.23.8/
1133
- └── README.md
1174
+ ├── README.md
1175
+ │ ├── sources-manifest.json
1176
+ │ ├── search-index.json
1177
+ │ └── mincer-findings.json
1134
1178
  └── ...
1135
1179
  ```
1136
1180