npm - sigmap - Versions diffs - 7.30.0 → 8.0.0 - Mend

sigmap 7.30.0 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/CHANGELOG.md +23 -0
package/README.md +9 -9
package/gen-context.js +581 -73
package/gen-project-map.js +14 -6
package/llms-full.txt +5 -5
package/llms.txt +5 -5
package/package.json +2 -1
package/packages/cli/package.json +1 -1
package/packages/core/package.json +1 -1
package/src/eval/runner.js +9 -61
package/src/evidence/pack.js +42 -8
package/src/map/build-ci.js +91 -0
package/src/map/config-manifest.js +101 -0
package/src/map/env-schema.js +90 -0
package/src/map/migrations.js +84 -0
package/src/mcp/handlers.js +5 -1
package/src/mcp/server.js +1 -1
package/src/retrieval/bm25.js +122 -0
package/src/retrieval/ranker.js +15 -1

package/CHANGELOG.md CHANGED Viewed

@@ -10,6 +10,29 @@ Format: [Semantic Versioning](https://semver.org/)
 ---
+## [8.0.0] — 2026-07-04
+Major release — **v8.5 "Repo-Context Coverage & Test Discovery" (C1 + C2 + C3).** Marks the v8 milestone: the signature map now reaches beyond functions/classes/routes into the repo's operational surface, impl→test discovery is measured rather than best-effort, and every Evidence Pack file carries a risk label from a richer, precedence-ordered set. All zero-dependency, deterministic, and in-boundary with the North-Star constraints. **No breaking API changes** — the `8.0.0` bump aligns the published version with the roadmap's v8 framing; existing `riskLabel`/`relatedTests` consumers keep working.
+### Added
+- **C1 — Repo-context coverage expansion (#402):** four dedicated zero-dep map analyzers under `src/map/` (mirroring `route-table.js`), wired into `gen-project-map.js` and the MCP `get_map` `MAP_SECTIONS` — `env-schema.js` (**Environment variables** — env reads across JS/TS/Python/Ruby/Go + `.env.example` keys), `build-ci.js` (**Build & CI** — npm/pnpm scripts, GitHub Actions workflows, Makefile targets), `config-manifest.js` (**Config & manifests** — package manifests across npm/Python/Rust/Go/Maven/Gradle/Ruby/PHP + notable config files), and `migrations.js` (**Database migrations** — Rails/Alembic/Prisma/Flyway/timestamped-SQL detection). `PROJECT_MAP.md` and `get_map` now surface all four sections.
+- **C2 — Measured test discovery (#402):** `findRelatedTests` now normalizes cross-language test conventions (`test_x.py`↔`x.py`, `x_test.go`↔`x.go`, `XTest.java`↔`X.java`, `x.spec.ts`↔`x.ts`). New reproducible benchmark `scripts/run-test-discovery-benchmark.mjs` (`npm run benchmark:test-discovery`) scores it against an independent canonical-name gold oracle over `benchmarks/repos` — no LLM, pure string math — measuring **F1 98.0%, hit@1 97.4% across 28 repos / 3,701 gold pairs**. The headline number is surfaced in `benchmarks/latest.json` under `test_discovery`.
+- **C3 — Richer risk labels (#402):** `riskLabelFor` now returns the v8.5 set — `migration | payment | auth | security | public-api | config | test | generated | source` — with strict most-specific-risk precedence (a migration touching auth is still `migration`; payment/auth outrank the generic `security` bucket). `test`/`generated`/`config`/`source` semantics are preserved so `findRelatedTests` and the verifier keep working. Extended coverage in `test/integration/evidence-pack.test.js`, `project-map.test.js`, and `benchmark-latest.test.js`.
+### Fixed
+- **Comparison-chart correctness multiplier (#399):** corrected the stale answer-correctness multiplier badge in `docs/comparison-chart.svg` (×5.2 → ×6.8) to match the current task-success benchmark.
+## [7.31.0] — 2026-07-02
+Minor release — **identifier-aware BM25 re-ranker.** Plain exact-token TF-IDF missed queries whose terms live *inside* code identifiers — `component emit` never surfaced `componentEmits` because that is one token sharing no exact term with the query. This was the dominant retrieval-miss cause. The new ranker splits identifiers, stems lightly, boosts path tokens, and scores with length-normalized BM25. Deterministic, zero new dependencies, no LLM/embeddings.
+### Added
+- **Identifier-aware BM25 re-ranker (#395, #396):** new zero-dependency `src/retrieval/bm25.js` with (1) identifier-aware tokenization (split camelCase / snake_case), (2) light stemming (`emits` → `emit`, `options` → `option`), (3) path-token boost (filename weighed 3×), and (4) BM25 length-normalized scoring instead of raw TF-IDF. Wired into the core ranker (`src/retrieval/ranker.js`) as the base relevance score — so `sigmap ask`, `sigmap --query`, and MCP `query_context` all benefit — with the existing negative-signal penalty and recency/graph/learned boosts layered on top. Also drives the benchmark runner (`src/eval/runner.js`) and the dev retrieval benchmark.
+- **BM25 unit tests (#396):** `test/integration/bm25.test.js` covers tokenization, stemming, path boost, the `component emit` → `componentEmits` motivating case, and deterministic tie-breaking.
+### Changed
+- **Retrieval benchmark refreshed:** on the 18-repo / 90-task suite, hit@5 rose **75.6% → 86.7%** (retrieval lift 5.6× → 6.4×), with rank-1 gains on flask, spring-petclinic, rails, and svelte (60% → 100%). The task-completion proxy also improved (task success 52.2% → 67.8%, prompts/task 1.72 → 1.46) since it retrieves through the same ranker. Residual misses (vapor, serilog) are files whose signatures genuinely lack the query vocabulary — out of scope, they need semantic retrieval.
 ## [7.30.0] — 2026-06-23
 Minor release — **v8.0 E2 + E4 (the "Pivot"):** completes v8.0 by repositioning every public surface to the chosen framing — *"the deterministic, verifiable grounding layer for AI code work"* — and framing coding agents as **consumers, not competitors**. The Evidence Pack code (E1/E3/D3 + `mcp install`) already shipped in 7.27–7.29; this is the positioning half. Docs/strings only — no runtime behaviour change, zero new dependencies.

package/README.md CHANGED Viewed

@@ -57,10 +57,10 @@ That map is exactly what agentic grep is worst at: reproducible, auditable conte
 **Proof it pays off** (full benchmark below):
 <!--SM:whyMetrics-->
-- **75.6% hit@5** — right file found in top 5 results (vs 13.6% baseline)
+- **86.7% hit@5** — right file found in top 5 results (vs 13.6% baseline)
 - **97.0% token reduction** — average across 21 real repos
-- **52.2% task success rate** — up from 10% without context
-- **1.72 prompts per task** — down from 2.84 (39.4% fewer retries)
+- **67.8% task success rate** — up from 10% without context
+- **1.46 prompts per task** — down from 2.84 (48.8% fewer retries)
 <!--/SM:whyMetrics-->
 - **<!--SM:languages-->33<!--/SM:languages--> languages supported** — TypeScript, Python, Go, Rust, Java, R, and more
 - **No vendor lock-in** — works with any AI assistant or local LLM
@@ -74,7 +74,7 @@ That map is exactly what agentic grep is worst at: reproducible, auditable conte
 | Without SigMap | With SigMap |
 |---|---|
 | ❌ Non-reproducible agent guesses | ✅ Deterministic map — same input, same output, every time |
-| ❌ "Trust me" AI answers | ✅ Grounded — right file in context <!--SM:hitWhole-->76%<!--/SM:hitWhole--> of the time, every symbol on a real line anchor |
+| ❌ "Trust me" AI answers | ✅ Grounded — right file in context <!--SM:hitWhole-->87%<!--/SM:hitWhole--> of the time, every symbol on a real line anchor |
 | ❌ Embeddings / vector DB required | ✅ Zero deps, no infra, fully offline |
 ---
@@ -98,13 +98,13 @@ Ask → Rank → Context → Validate → Judge → Learn
 <!--SM:benchmarkBlock-->
 ```
-Benchmark : sigmap-v7.30-main (21 repositories, including R language)
-Date      : 2026-06-23
+Benchmark : sigmap-v8.0-main (21 repositories, including R language)
+Date      : 2026-07-04
-Hit@5          : 75.6%   (baseline 13.6%  — 5.6× lift)
+Hit@5          : 86.7%   (baseline 13.6%  — 6.4× lift)
 Token reduction: 97.0%   (across 21 repos)
-Prompt reduction : 39.4% (2.84 → 1.72 prompts per task)
-Task success   : 52.2%   (baseline 10%)
+Prompt reduction : 48.8% (2.84 → 1.46 prompts per task)
+Task success   : 67.8%   (baseline 10%)
 Repos tested   : 21 (JavaScript, Python, Go, Rust, Java, R, C++, C#, Dart, Swift, Ruby, PHP, Scala, Kotlin, and more)
 ```
 <!--/SM:benchmarkBlock-->