pantheon-guard 0.4.0-pre.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/CHANGELOG.md +572 -0
  2. package/LICENSE-COMMERCIAL.md +76 -0
  3. package/LICENSE-MIT.md +21 -0
  4. package/PITCH.md +301 -0
  5. package/README.md +410 -0
  6. package/README.ru.md +345 -0
  7. package/dist/index.cjs +1 -0
  8. package/dist/index.d.cts +3718 -0
  9. package/dist/index.d.ts +3718 -0
  10. package/dist/index.mjs +1 -0
  11. package/docs/CONFORMAL.md +254 -0
  12. package/docs/DISTRIBUTION-SHIFT-PAC-BAYES.md +148 -0
  13. package/docs/LEARNING.md +59 -0
  14. package/docs/MINIMAX-BENCHMARK.md +175 -0
  15. package/docs/PAC-BAYES-BOUND.md +284 -0
  16. package/docs/PHILOSOPHY.md +96 -0
  17. package/docs/SECURITY.md +246 -0
  18. package/docs/distshift_pac_bayes_compute.py +77 -0
  19. package/docs/pac_bayes_compute.py +117 -0
  20. package/examples/adversarial-corpus.js +225 -0
  21. package/examples/anthropic-chat.js +84 -0
  22. package/examples/basic.js +107 -0
  23. package/examples/benchmark-comparison-baseline.js +226 -0
  24. package/examples/benchmark-multiregion-corpus.js +730 -0
  25. package/examples/benchmark-multiregion-runner.js +127 -0
  26. package/examples/benchmark-phase1-corpus.js +888 -0
  27. package/examples/benchmark-phase1-runner.js +178 -0
  28. package/examples/chrome-extension/README.md +50 -0
  29. package/examples/chrome-extension/content.js +28 -0
  30. package/examples/chrome-extension/manifest.json +21 -0
  31. package/examples/cli-demo.js +87 -0
  32. package/examples/conformal-data.json +41 -0
  33. package/examples/conformal-demo.js +110 -0
  34. package/examples/epistemology-fixtures.js +132 -0
  35. package/examples/healthcare-pack-demo.js +110 -0
  36. package/examples/nemo-output-rail/README.md +119 -0
  37. package/examples/nemo-output-rail/adversarial.txt +5 -0
  38. package/examples/nemo-output-rail/baseline.yml +19 -0
  39. package/examples/nemo-output-rail/config.yml +23 -0
  40. package/examples/nemo-output-rail/pantheon-rail.py +100 -0
  41. package/examples/nemo-output-rail/run.sh +30 -0
  42. package/examples/news-pack-real-coverage.js +123 -0
  43. package/examples/openai-chat.js +86 -0
  44. package/examples/real-news-corpus.js +191 -0
  45. package/examples/real-world-domain-tests.js +166 -0
  46. package/package.json +67 -0
  47. package/src/algorithm.js +267 -0
  48. package/src/calibrator.js +153 -0
  49. package/src/conformal-weighted.js +215 -0
  50. package/src/conformal.js +202 -0
  51. package/src/constants.js +31 -0
  52. package/src/detect-patterns.js +155 -0
  53. package/src/index.js +124 -0
  54. package/src/inspect.js +99 -0
  55. package/src/integrity.js +107 -0
  56. package/src/laws.js +76 -0
  57. package/src/learning/README.md +39 -0
  58. package/src/learning/index.cjs +1099 -0
  59. package/src/mahavrata.js +173 -0
  60. package/src/normalize.js +179 -0
  61. package/src/packs/epistemology.js +261 -0
  62. package/src/packs/healthcare.js +298 -0
  63. package/src/packs/index.js +249 -0
  64. package/src/packs/news-de.js +182 -0
  65. package/src/packs/news.js +563 -0
  66. package/src/principles.js +119 -0
  67. package/src/sign.js +151 -0
  68. package/src/svadharma.js +141 -0
  69. package/src/wrap-agent.js +73 -0
package/CHANGELOG.md ADDED
@@ -0,0 +1,572 @@
1
+ # Changelog
2
+
3
+ All notable changes to `pantheon-guard` will be documented here.
4
+ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
5
+ and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
+
7
+ ## [Unreleased]
8
+
9
+ ## [0.4.0-pre.3] — 2026-05-07
10
+
11
+ ### Released — first npm publication
12
+
13
+ - Package published to npm registry under `next` dist-tag.
14
+ - `npm install pantheon-guard@next` now works without GitHub-tag fallback.
15
+ - `package.json` description updated to clarify positioning: **content-safety
16
+ for AI agent output**, distinct from input-attack guardrails (NeMo, Bedrock,
17
+ Lakera, Llama-Guard).
18
+
19
+ ### Added — cross-language benchmark (N=509, pre-registered)
20
+
21
+ Multi-region production headlines from 12 RSS sources across RU + DE + EN/UK,
22
+ with labels assigned BEFORE running guard and SHA-256 hash captured per
23
+ corpus file. Aggregate accuracy 92.5% (Wilson 95% CI [89.9%, 94.5%]),
24
+ FP-rate 0.2%. Per-region breakdown:
25
+
26
+ - Russian (N=280): 95.7% accuracy, FP=0.4%
27
+ - German (N=100): 96.0% accuracy, FP=0.0%
28
+ - English/UK (N=129): ~90% accuracy, FP=0.0%
29
+
30
+ All three pre-registered hypotheses HOLD (accuracy ≥85%, mainstream FP ≤5%,
31
+ tabloid catch ≥60%). Reproducible: `node examples/benchmark-phase1-runner.js`.
32
+
33
+ ### Added — `newsPack` (closes solo-clickbait gap)
34
+
35
+ Domain pack for news / media AI output. Closes the gap documented in
36
+ `REAL-WORLD-DOMAIN-TESTS-2026-05-04.md` where standalone clickbait stacks
37
+ slipped through core detection because all hits routed to a single
38
+ `clickbait` flag and the meta-flag required ≥2 flags.
39
+
40
+ Approach: news-specific clickbait phrases route to `satya`,
41
+ anonymous-source phrases to `asteya`, panic framing to `ahimsa`, and
42
+ "before-it's-deleted" urgency to `indriya_nigraha`. Pack violations
43
+ block independently of the core meta-flag — a single hit fails.
44
+
45
+ Pattern coverage (RU + EN):
46
+ - Shocking-secret / hidden-truth / "secret nobody knows" framing
47
+ - "They don't want you to know" / "скрывают от народа" conspiracy frames
48
+ - "You won't believe" / "вы не поверите"
49
+ - "Media silence" / "о чём молчат СМИ" / "what the mainstream media won't tell you"
50
+ - "Doctors hate this" / "эксперты ненавидят"
51
+ - "Exposed!" / "разоблачение!" sensational-bang
52
+ - "Will change everything / the world / history"
53
+ - Anonymous "sources say" / "according to reports" — suppressed when a
54
+ named outlet (Reuters, Bloomberg, NYT, etc.) appears within 200 chars
55
+ - Panic framing in headlines/ledes
56
+ - "Read this before it's deleted" / "пока не удалили"
57
+
58
+ Calibrator overrides: `NOISE_FLOOR: 0.20`, `STRONG_THRESHOLD: 0.55` —
59
+ same logic as healthcare pack (high downstream cost via virality).
60
+
61
+ Tests: 36 new fixtures including 3 regression cases for the documented
62
+ solo-clickbait gap (EN, RU, mixed-script bypass).
63
+
64
+ ## [0.4.0-pre.2] — 2026-05-04
65
+
66
+ ### Polish pass — fixed dead feature, hot-path perf, DRY, stale comments
67
+
68
+ A `simplify`-skill review across v0.3 + v0.4 surface flagged one real
69
+ correctness bug, three hot-path inefficiencies, and several stale
70
+ comments. All addressed in a single atomic commit; no API breakage.
71
+
72
+ #### Fixed — `calibratorOverrides` was documented but not wired
73
+
74
+ `healthcarePack.calibratorOverrides` declared `NOISE_FLOOR: 0.20` and
75
+ `STRONG_THRESHOLD: 0.55`, but `runPack` / `applyPack` / `stackPacks`
76
+ never read them — the override was silently ignored, and healthcare
77
+ detection ran with default core thresholds. **This was a bug**: the
78
+ killer feature for higher-stakes domains was non-functional.
79
+
80
+ Plumbed end-to-end:
81
+ - `calibrate(text, evidence, overrides?)` now accepts a partial override
82
+ map and merges with `CALIBRATOR_PARAMS` per call.
83
+ - `isStrong(c, threshold?)` now accepts an explicit threshold instead
84
+ of always reading the module-level constant.
85
+ - `detectPatternsCalibrated(text, {overrides})` plumbs through.
86
+ - `inspect(text, {calibratorOverrides})` plumbs through; the strong-
87
+ threshold used for boolean conversion respects the override.
88
+ - `applyPack(pack)` and `stackPacks(packs)` pass the pack's
89
+ `calibratorOverrides` (merged for stacks) into the inspect call.
90
+ - Two new tests in `test/packs-healthcare.test.js` verify end-to-end
91
+ override plumbing — including a thresholds-only mini-pack proving
92
+ packs with no patterns still take effect.
93
+
94
+ #### Fixed — double normalization in `applyPack` / `stackPacks` hot path
95
+
96
+ Previously `applyPack` called `coreInspect` (which normalized inside
97
+ `detectPatternsCalibrated`) and then `runPack` (which normalized again).
98
+ For `stackPacks` with N packs that was N+1 normalization passes per
99
+ inspect call. Normalization is the most expensive step in the
100
+ deterministic pipeline.
101
+
102
+ Fixed: `applyPack` and `stackPacks` compute `normalized` once, pass it
103
+ into both `coreInspect` (via new `options.normalized` hint) and
104
+ `runPack` (via new optional second argument). Single normalization
105
+ per inspect call regardless of pack stack depth.
106
+
107
+ #### Fixed — `normalize.js` ASCII fast-path
108
+
109
+ Three of the five normalization stages (homoglyph fold, leet, spaced
110
+ collapse) only matter for non-ASCII or special-character input. Each
111
+ now runs only when a cheap `RegExp.test()` confirms the relevant
112
+ characters are present. Pure-ASCII text (the majority of production
113
+ traffic) skips ~70% of normalization work. Same applies to the
114
+ zero-width strip — `.test()` first, `.replace()` only on hit.
115
+
116
+ #### DRY
117
+
118
+ - `VALID_RULES` in `packs/index.js` now derived from
119
+ `Object.keys(MAHAVRATA.rules)` instead of hardcoded — adding/renaming
120
+ a mahā-vrata rule auto-propagates to pack validation.
121
+ - `VALID_SEVERITIES` extracted as a frozen constant; pack validation
122
+ + error messages reference it.
123
+ - `EMPTY_FLAGS` and `EMPTY_CALIBRATED` extracted as frozen module-level
124
+ constants in `detect-patterns.js`; previously rebuilt on every empty
125
+ call.
126
+
127
+ #### Cleanup
128
+
129
+ - Removed stale "lazy import to avoid circular dependency" comment in
130
+ `detect-patterns.js` — the import was never lazy and there was never
131
+ a cycle.
132
+ - Removed "v0.2 will replace this", "v0.3 will fit it", "Acceptable for
133
+ v0.1" version-roadmap narration from comments — the calibrator
134
+ exists, the comments were misleading.
135
+ - Removed the eslint-disable comment that referenced a non-existent
136
+ `require()` pattern.
137
+ - Tightened `inspectWeightedConformal` test: now asserts
138
+ `r10.threshold <= r1.threshold` with derivation comment, instead of
139
+ only checking that `weightTest` is reflected in the output object.
140
+ Previous test would have passed even if `weightTest` were silently
141
+ ignored by the quantile.
142
+
143
+ #### Build delta
144
+
145
+ - ESM 62.89 → 63.61 KB (+0.7 KB for plumbing + EMPTY_* constants;
146
+ partially offset by stale-comment removal)
147
+ - DTS 94.25 → 90.54 KB (-3.7 KB after JSDoc cleanup)
148
+ - Tests 165 → 167 (added 2 calibratorOverrides verification tests)
149
+
150
+ #### Backward compatibility
151
+
152
+ All public exports unchanged. The only behavior change is that
153
+ `applyPack(healthcarePack)` (and any future pack with
154
+ `calibratorOverrides`) now actually applies its overrides — which was
155
+ the documented behavior all along.
156
+
157
+ ## [0.4.0-pre.1] — 2026-05-04
158
+
159
+ ### Added — domain rule-pack architecture + first pack (`healthcare`)
160
+
161
+ Introduces a composable extension point for domain-specific manipulation
162
+ detection. Packs add specialized regex patterns and positive requirements
163
+ on top of the deterministic mahā-vrata core, without altering it.
164
+
165
+ #### `src/packs/index.js` — pack architecture
166
+
167
+ Three composable pieces:
168
+
169
+ 1. **`detectionPatterns`** — regex matchers tagged with the existing
170
+ mahā-vrata rule they route through (ahimsa / satya / asteya /
171
+ shaucha / indriya_nigraha). Packs do NOT introduce new top-level
172
+ ethical categories — every domain harm maps onto a Yoga-sūtra rule
173
+ for principled audit consistency.
174
+ 2. **`requirements`** — domain-specific *positive* requirements (e.g.
175
+ "healthcare AI must include provider-escalation language when
176
+ discussing symptoms"). Each requirement has a `condition`, a
177
+ `check`, a severity, and an audit message.
178
+ 3. **`calibratorOverrides`** — per-pack tightening of calibration
179
+ thresholds. Higher-stakes domains use lower noise floors and lower
180
+ strong thresholds.
181
+
182
+ API:
183
+
184
+ - `applyPack(pack)` — wraps `inspect()` with a pack, returning an
185
+ enhanced inspect function. Backward compatible: core inspect()
186
+ unchanged; only callers that opt in see pack output.
187
+ - `stackPacks([pack1, pack2, ...])` — composes multiple packs (e.g.
188
+ healthcare + finance for medtech-fintech apps). Violations and
189
+ evidence merge into a single result object.
190
+ - `runPack(pack, text)` — direct pack run for testing / instrumentation.
191
+ - `validatePack(pack)` — defensive registration check.
192
+
193
+ #### `src/packs/healthcare.js` — first commercial pack
194
+
195
+ Healthcare-specific manipulation detection covering RU + EN:
196
+
197
+ | Category | Routes through | Patterns |
198
+ |---|---|---|
199
+ | Self-diagnosis overclaim | satya | `self_dx_en`, `self_dx_ru`, `self_dx_listicle_en` |
200
+ | Cure / 100% guarantee claim | satya | `cure_claim_en`, `cure_claim_ru` |
201
+ | False reassurance | satya | `false_reassurance_en`, `false_reassurance_ru` |
202
+ | Medication-adjustment advice | ahimsa | `med_adjust_en`, `med_adjust_ru`, `med_recommend_en` |
203
+ | Red-flag symptom dismissal | ahimsa | `serious_dismissal_en` |
204
+ | Off-prescription urgency | indriya_nigraha | `urgent_self_med_en` |
205
+ | "Studies show" without source | satya | `studies_show_en` |
206
+
207
+ Plus one **positive requirement**:
208
+ - `provider_escalation` — when text discusses symptoms / treatment /
209
+ medication / medical conditions, it MUST include language directing
210
+ the user to a healthcare provider. Failing this is a high-severity
211
+ violation. Implemented for both English and Russian.
212
+
213
+ Calibrator overrides for higher-stakes context:
214
+ - `NOISE_FLOOR`: 0.30 → 0.20
215
+ - `STRONG_THRESHOLD`: 0.70 → 0.55
216
+
217
+ 19 new tests in `test/packs-healthcare.test.js` covering each pattern,
218
+ the requirement in both languages, the positive (clean) cases, the
219
+ override behavior, and stacking. Suite now **165/165 passing**.
220
+
221
+ #### `examples/healthcare-pack-demo.js`
222
+
223
+ Runnable demo showing 9 representative inputs through both core
224
+ (`inspect`) and `applyPack(healthcarePack)`. Demonstrates the
225
+ commercial value: clean medical text passes both; manipulative or
226
+ unsafe text passes core but is blocked by the healthcare pack with
227
+ named pack-violation source and unmet-requirement id.
228
+
229
+ #### Why a new minor version
230
+
231
+ This is the first new architectural surface since v0.2: packs are an
232
+ extension point, not just a code add. They open a commercial product
233
+ line (paid per-domain packs) that the OSS core monetizes through the
234
+ existing dual-license model.
235
+
236
+ ### Commercial — first paid pack pricing tier
237
+
238
+ `@pantheon/guard-healthcare`:
239
+ - Free: evaluation / pilot
240
+ - Starter: $499 / month (small healthtech, < $5M ARR)
241
+ - Enterprise: $4 990 / month + (large healthtech / hospital)
242
+ - Custom regulatory geography rules: negotiated
243
+
244
+ Same pattern will apply to upcoming packs:
245
+ - `@pantheon/guard-finance` — FOMO, pressure CTA, mandatory risk disclosure
246
+ - `@pantheon/guard-education` — child-safety, anti-comparative-ranking
247
+ - `@pantheon/guard-recruiting` — false-urgency-in-offers, salary disclosure
248
+
249
+ ### Backward compatibility
250
+
251
+ All v0.1, v0.2, v0.2.1, v0.2.2, v0.3.0 exports unchanged. Pack support
252
+ is purely additive. Existing `inspect()` / `inspectConformal()` /
253
+ `inspectSigned()` consumers see no behavior change.
254
+
255
+ ### Build delta
256
+
257
+ - ESM 53.07 KB → 56.5 KB (+3.5 KB for pack runtime + healthcare pack)
258
+ - Tests: 146 → 165
259
+
260
+ ## [0.3.0-pre.1] — 2026-05-04
261
+
262
+ ### Added — security hardening + watermarking layer
263
+
264
+ The v0.2.2 calibrated detector had real bypass vectors. Audit found
265
+ that with neutral metadata (`urgency: 0.3, paused: true`), the
266
+ following attacks let manipulative content through:
267
+
268
+ - Cyrillic / Greek homoglyph swaps (`Huррy` with Cyrillic `р`).
269
+ - Mixed homoglyphs in fear words (`rеgrеt` with Cyrillic `е`).
270
+ - Zero-width / BOM insertions (`Hu​rry`).
271
+ - Fullwidth Latin (`Hurry`).
272
+ - Leetspeak (`y0u'll r3gr3t`).
273
+ - Spaced-out tokens (`H u r r y`).
274
+
275
+ v0.3.0-pre.1 closes all of them and adds two watermarking layers
276
+ on top.
277
+
278
+ #### `src/normalize.js` — text normalization layer
279
+
280
+ - NFKC unicode normalization (collapses fullwidth, ligatures, compat).
281
+ - Zero-width / BOM / bidi-override stripping.
282
+ - **Mixed-script homoglyph fold.** Cyrillic/Greek lookalikes are
283
+ folded to Latin only inside words containing both scripts; pure
284
+ Russian text passes through untouched, preserving Russian regex
285
+ matches.
286
+ - Leetspeak digit-to-letter fold (between letter neighbors only).
287
+ - Spaced-out single-letter collapse.
288
+ - Wired into `detectPatterns()` and `detectPatternsCalibrated()`.
289
+
290
+ #### `src/sign.js` — verdict signing (watermark layer)
291
+
292
+ - `inspectSigned(text, { secret, ...opts })` — runs `inspect()` and
293
+ returns the verdict together with a HMAC-SHA-256 signature over a
294
+ canonical-JSON serialization of the payload. Includes timestamp,
295
+ library version, and signature version.
296
+ - `verifySignedVerdict(signed, secret)` — timing-safe verification.
297
+ Returns `{ valid, reason }`. Catches: tampered fields, wrong secret,
298
+ unknown library identifier, unsupported signature version.
299
+ - `canonicalize(value)` — deterministic JSON with keys sorted at every
300
+ level (exposed for callers needing the same canonicalization).
301
+ - `signPayload()` / `verifyPayload()` — low-level primitives.
302
+
303
+ #### `src/integrity.js` — frozen-rule hash (rule-watermark)
304
+
305
+ - `getIntegrity()` — returns SHA-256 hashes of frozen rule structures
306
+ (MAHAVRATA, LAWS, PRINCIPLES, FIVE_STEP_ALGORITHM, SVADHARMA_SCHEMA,
307
+ LAYERS, GUNAS, PRIORITY) and a separately-versioned hash of
308
+ CALIBRATOR_PARAMS.
309
+ - `assertRuleSetHash(expected)` — CI / startup integrity check; throws
310
+ on drift.
311
+ - `getBuildFingerprint()` — 16-char fingerprint combining rule and
312
+ calibrator hashes plus library version.
313
+
314
+ v0.3.0-pre.1 baseline:
315
+ ```
316
+ rule_set_hash: 1da1b908e3577579fb01e43811f255c4f772b4de5e96d20deb5c265f72797848
317
+ calibrator_params_hash: 718349b8fd5dbdb150da61c5b9e91aca18cd297be16ba49c44002b6613ad5664
318
+ build_fingerprint: 1434724a34f04e30
319
+ ```
320
+
321
+ #### Tests
322
+
323
+ - 22 `test/adversarial.test.js` — systematic bypass attempts per rule,
324
+ including all v0.2.2 audit findings + ReDoS stress tests.
325
+ - 14 `test/sign.test.js` — round-trip, tampering rejection, timing-safe
326
+ on length mismatch, signature-version + library-id rejection.
327
+ - 7 `test/integrity.test.js` — hash stability, mismatch detection,
328
+ malformed hash rejection.
329
+ - Suite now **146/146 passing** (was 103).
330
+
331
+ #### Docs
332
+
333
+ - `docs/SECURITY.md` — threat model (4 adversary classes), defense
334
+ layer mapping, audit transcript (v0.2.2 → v0.3.0), watermarking
335
+ comparison to LLM output watermarks.
336
+
337
+ ### Why this is a 0.3.x bump (not 0.2.3)
338
+
339
+ Three new public modules (`normalize`, `sign`, `integrity`), 11 new
340
+ exports, and a behavior change in `detectPatterns()` (now applies
341
+ normalization before regex). Backward-compatible at the level of
342
+ *intent* (clean text still produces clean verdicts; bypass attempts
343
+ now produce blocking verdicts) — but the *behavior* on adversarial
344
+ input changes by design.
345
+
346
+ ### Build delta
347
+
348
+ - ESM 49.04 KB → ~52 KB (+3 KB for normalize + sign + integrity)
349
+ - CJS comparable
350
+ - Tests 103 → 146
351
+
352
+ ## [0.2.2-pre.1] — 2026-05-04
353
+
354
+ ### Added — three more theorems closing the formal-guarantees suite
355
+
356
+ - `src/conformal-weighted.js` — weighted conformal prediction under
357
+ covariate shift (Tibshirani, Foygel-Barber, Candès, Ramdas, NeurIPS
358
+ 2019). Caller supplies importance weights `w(x_i) = dP_test/dP_cal`
359
+ per calibration point and an optional `weightTest`. The threshold
360
+ becomes the weighted (1-α-p_test) empirical quantile, restoring
361
+ marginal coverage under any `P_test ≪ P_cal`.
362
+ - New API: `fitWeightedConformal()`, `inspectWeightedConformal()`,
363
+ `weightedQuantile()` (low-level, exposed for advanced callers).
364
+ - 10 new tests including coverage check under simulated shift.
365
+ Suite now 103/103 passing.
366
+ - `docs/DISTRIBUTION-SHIFT-PAC-BAYES.md` + `distshift_pac_bayes_compute.py`
367
+ — Germain–Habrard–Laviolette–Morvant 2016/2020 extension of the
368
+ McAllester bound to the case `P_bench ≠ P_prod`. Adds
369
+ `√(D₂(Q‖P) / 2) + λ` shift-correction term. Headline numerical
370
+ instantiation: at base bound = 0.093, total widens to 0.32 under
371
+ mild shift (`D₂=0.1`) and saturates near `D₂=2`.
372
+ - `docs/MINIMAX-BENCHMARK.md` — Sion's minimax theorem (1958)
373
+ applied to v0.3 benchmark design. Pre-commits category × language
374
+ budget in git, publishes worst-case stress-test gap alongside
375
+ every metric. Certifies that the test distribution lies near a
376
+ saddle point — publisher cannot retroactively cherry-pick.
377
+ - PITCH.md sections 2.1.3, 2.1.4, 2.1.5 — three new sub-sections
378
+ on distribution-shift PAC-Bayes, Sion-minimax benchmark, and the
379
+ full seven-guarantee defense-in-depth table.
380
+
381
+ ### The seven-guarantee suite (complete after v0.2.2)
382
+
383
+ | Layer | Theorem |
384
+ |---|---|
385
+ | Maha-vrata | (axiomatic) Yoga-sūtra II.30-31 |
386
+ | Calibration | Cox 1946 + de Finetti 1937 |
387
+ | PAC-Bayes (aggregate) | McAllester 1999 / Catoni 2007 |
388
+ | Distribution-shift PAC-Bayes | Germain et al. 2016/2020 |
389
+ | Conformal (per-instance) | Vovk 1999 / 2005 |
390
+ | Weighted conformal | Tibshirani et al. 2019 |
391
+ | Benchmark design (Sion-minimax) | Sion 1958 |
392
+
393
+ ### Build delta
394
+
395
+ - ESM 47.01 KB → 48.5 KB (+1.5 KB for weighted conformal)
396
+ - Tests: 93 → 103
397
+ - New docs: 3 (DISTRIBUTION-SHIFT-PAC-BAYES, MINIMAX-BENCHMARK,
398
+ weighted-conformal section in CONFORMAL.md)
399
+
400
+ ### Backward compatibility
401
+
402
+ All v0.1, v0.2.0-pre.1, v0.2.1-pre.1 exports unchanged. Weighted
403
+ conformal is strictly additive; standard `inspectConformal()` continues
404
+ to work.
405
+
406
+ ## [0.2.1-pre.1] — 2026-05-04
407
+
408
+ ### Added — conformal prediction layer
409
+
410
+ - `src/conformal.js` — split conformal prediction wrapper over the v0.2
411
+ calibrator. Distribution-free finite-sample marginal coverage
412
+ guarantee per Vovk, Gammerman, Shafer (2005). For exchangeable
413
+ calibration data, the prediction set covers the true label with
414
+ probability ≥ 1-α regardless of underlying model accuracy.
415
+ - `fitConformal(calibrationSet, options)` — offline fit; computes the
416
+ finite-sample quantile threshold and returns a calibrator object
417
+ with explicit coverage guarantee.
418
+ - `inspectConformal(text, options)` — request-time wrapper around
419
+ `inspect()`. Returns `verdict_set ⊆ {manipulation, safe}` plus the
420
+ full inspect() output. Three set shapes map cleanly onto
421
+ block/pass/abstain actions; abstain is the certified-uncertainty
422
+ signal no other guardrail vendor offers.
423
+ - `nonconformityScore(text, label, options)` — exposed for advanced
424
+ callers building custom score functions or weighted variants.
425
+ - `examples/conformal-data.json` — 32 hand-labelled calibration
426
+ examples (RU+EN, balanced manipulation/safe). Production swaps for
427
+ the v0.3 hand-labelled benchmark (~1000 examples) at which point
428
+ the marginal coverage guarantee becomes meaningfully tight.
429
+ - `examples/conformal-demo.js` — live demonstration of the three
430
+ verdict shapes plus held-out empirical coverage check (8/8 covered
431
+ at α=0.2 in the bundled split).
432
+ - `docs/CONFORMAL.md` — formal theorem statement, mapping onto guard,
433
+ comparison with PAC-Bayes (defense-in-depth pair), references
434
+ to Tibshirani 2019 (covariate shift) and Gibbs 2021 (online
435
+ adaptive) for v0.4 extensions.
436
+ - 14 new tests including empirical-coverage check on held-out split.
437
+ Suite now 93/93 passing.
438
+
439
+ ### Why conformal in addition to PAC-Bayes (not instead of)
440
+
441
+ The two bounds form a defense-in-depth pair, not redundancy:
442
+
443
+ | Layer | Type of guarantee | Right context |
444
+ |---|---|---|
445
+ | PAC-Bayes (v0.2.0-pre.1) | average risk gap | aggregate claim on benchmark page |
446
+ | Conformal (v0.2.1-pre.1) | per-instance coverage | production request-time decision |
447
+
448
+ Neither subsumes the other. PAC-Bayes asks "how good is the calibrator
449
+ on average?" — the right tool for PITCH/benchmark numbers. Conformal
450
+ asks "what does the calibrator honestly know about *this* input?" —
451
+ the right tool for production routing decisions. Section 2.1 of PITCH
452
+ now references both as a complementary pair.
453
+
454
+ ### Build delta
455
+
456
+ - ESM 45.59 KB → 47.01 KB (+1.42 KB)
457
+ - CJS 45.74 KB → 47.18 KB (+1.44 KB)
458
+ - DTS 46.08 KB → 53.30 KB
459
+
460
+ 1.4 KB of code for one of the strongest formal guarantees in
461
+ machine-learning theory. The ratio is the point.
462
+
463
+ ### Backward compatibility
464
+
465
+ All v0.1 and v0.2.0-pre.1 exports unchanged. `inspect()` continues
466
+ to work without conformal; `inspectConformal()` is strictly additive.
467
+
468
+ ## [0.2.0-pre.1] — 2026-05-04
469
+
470
+ ### Added — calibration layer
471
+
472
+ - `src/calibrator.js` — deterministic v0.2 calibration. Maps raw regex
473
+ evidence to per-flag confidence in [0, 1] using a saturating combiner
474
+ with short-text penalty and noise floor. Zero dependencies, ~150 lines.
475
+ - `detectPatternsCalibrated(text)` — v0.2 detector. Returns the same
476
+ boolean shape as `detectPatterns` for backward compatibility, plus
477
+ per-flag confidence, evidence markers naming which sub-patterns fired,
478
+ and an `abstain` decision when the input is too thin.
479
+ - `inspect(text, options)` — top-level v0.2 API that runs the full
480
+ pipeline (detect → calibrate → checkMahavrata) in one call, with
481
+ selectable decision policy: `'strict'` reproduces v0.1 behavior;
482
+ `'calibrated'` (default) requires confidence ≥ 0.7 for a flag to
483
+ trigger, and abstains on too-short input.
484
+ - 19 new tests (`calibrator.test.js`, `inspect.test.js`) verifying
485
+ monotonicity in hits, abstain on short input, calibrated-vs-strict
486
+ divergence on weak signals, evidence-marker shape, and confidence
487
+ range invariants. Total suite now 79 tests.
488
+
489
+ ### Why this version exists
490
+
491
+ A controlled experiment in
492
+ `C:\ProjectS\glyph_reconstruction\REPORT_PHASE2.md` measured a
493
+ sparsity-regularized classifier producing 33.6% confident-but-wrong
494
+ answers in the underdetermined regime. That is the failure mode every
495
+ competing guardrail also exhibits but does not surface to callers.
496
+ v0.2 takes the lesson directly: confidence is a property of the input
497
+ regime, not the model. The calibrator surfaces it; `inspect()` lets
498
+ the caller choose whether to act on uncertain signals or abstain.
499
+
500
+ This positions calibrated honest-uncertainty as the differentiating
501
+ property of `pantheon-guard`, replacing the v0.1 placeholder roadmap
502
+ note about "trained classifier coming in v0.2" with a deterministic
503
+ calibration layer that ships now and stays auditable forever.
504
+
505
+ ### Backward compatibility
506
+
507
+ - All v0.1 exports unchanged — `detectPatterns`, `checkMahavrata`,
508
+ `runFiveSteps`, `checkAction`, `wrapAgent`, etc.
509
+ - `inspect()` is additive; no existing code paths altered.
510
+ - 60 prior tests still pass identically.
511
+
512
+ ### Known limitations
513
+
514
+ - Calibration constants (`TAU`, `BASE_PER_HIT`, etc. in
515
+ `CALIBRATOR_PARAMS`) are heuristic v0.2 baselines. v0.3 will fit
516
+ them to BENCHMARK ground truth via logistic regression.
517
+ - The abstain decision uses token count; future revisions may add
518
+ context features (caps ratio, punctuation density, sentence count).
519
+
520
+ ## [0.1.0] — Initial extraction
521
+
522
+ ### Added
523
+
524
+ - Initial extraction of the Pantheon deterministic conscience layer from
525
+ the production Avito Chrome extension into a standalone npm package.
526
+ - Seven focused source modules: `constants`, `mahavrata`, `svadharma`,
527
+ `algorithm`, `principles`, `laws`, `index`.
528
+ - Public functions:
529
+ - `checkMahavrata(action)` — five-yama deterministic check
530
+ - `validateSvadharma(svadharma)` — agent formula validation
531
+ - `checkSvadharmaConsistency(svadharma, action)` — fit check
532
+ - `runFiveSteps(agent, action)` / `checkAction(...)` — full algorithm
533
+ - `detectPatterns(text)` — regex heuristics for RU + EN manipulation
534
+ - `wrapAgent(agent).act(action, executor)` — runtime guard wrapper
535
+ - `getMahavrata()`, `getAlgorithm()`, `getPrinciple()`, `getLaw()`
536
+ - Frozen exported structures: `MAHAVRATA`, `SVADHARMA_SCHEMA`,
537
+ `FIVE_STEP_ALGORITHM`, `PRINCIPLES`, `LAWS`, plus `LAYERS`, `GUNAS`,
538
+ `PRIORITY` enums.
539
+ - Dual ESM + CJS build via `tsup`, with `.d.ts` and `.d.cts` types.
540
+ - 60 unit tests (Node test runner, `node:test`).
541
+ - Examples:
542
+ - `basic.js` — minimal hello-world
543
+ - `openai-chat.js` — OpenAI guarded chat with regenerate-on-block
544
+ - `anthropic-chat.js` — Anthropic equivalent
545
+ - `nemo-output-rail/` — full NeMo Guardrails integration with
546
+ side-by-side baseline + guarded demo
547
+ - `chrome-extension/` — minimal MV3 demo
548
+ - Documentation:
549
+ - `README.md` (English) and `README.ru.md` (Russian)
550
+ - `PITCH.md` — strategic one-pager
551
+ - `docs/PHILOSOPHY.md` — engineering rationale for the rule choice
552
+ - `docs/LEARNING.md` — status of the deferred learning module
553
+ - Dual licensing: MIT for code, commercial addendum for production use.
554
+
555
+ ### Known limitations
556
+
557
+ - `LearningCycle` (`src/learning/index.cjs`) is **not bundled** because
558
+ it depends on `pantheon-agents.js`, which was not extracted. See
559
+ `docs/LEARNING.md` for the unblock plan.
560
+ - `detectPatterns` uses regex heuristics for v0.1. v0.2 will replace it
561
+ with a trained classifier benchmarked against NeMo / Llama Guard /
562
+ Lakera / Guardrails AI. The Mahā-vrata layer above stays unchanged.
563
+ - Bundle size is ~42 KB minified (ESM) — larger than the 18 KB target
564
+ hinted at in early README drafts. The rule data tables make up the
565
+ bulk; the algorithm itself is small. README states the actual size.
566
+
567
+ ## [0.1.0] — TBD
568
+
569
+ Initial public release. Pending:
570
+ - LICENSE-MIT.md and LICENSE-COMMERCIAL.md final wording review
571
+ - npm publish (one-way; held until the README and PITCH are
572
+ cross-checked one more time)
@@ -0,0 +1,76 @@
1
+ # Commercial Use Addendum
2
+
3
+ > This is a **template**, not a fully negotiated commercial agreement.
4
+ > Final terms for production use at scale (corporate AI products,
5
+ > commercial SaaS, internal corporate AI services) are negotiated
6
+ > separately. Contact information at the bottom of this file.
7
+
8
+ ## What MIT covers
9
+
10
+ `pantheon-guard` is dual-licensed. Under [LICENSE-MIT.md](./LICENSE-MIT.md)
11
+ you may use, copy, modify, distribute and embed the package in:
12
+
13
+ - personal projects
14
+ - educational / academic use
15
+ - open-source projects
16
+ - non-commercial research
17
+ - evaluation and pilot deployments
18
+
19
+ The MIT license requires attribution: keep the copyright notice and
20
+ permission notice in any substantial copy of the code.
21
+
22
+ ## What requires a commercial subscription
23
+
24
+ A commercial subscription is required when:
25
+
26
+ 1. You embed `pantheon-guard` in a **commercial product** that you
27
+ sell or license to customers;
28
+ 2. You deploy `pantheon-guard` at runtime inside a **production**
29
+ commercial AI service or SaaS;
30
+ 3. You use `pantheon-guard` **internally** at a corporation with
31
+ greater than 50 employees in production-facing AI systems.
32
+
33
+ Evaluation, pilots, and trials do not require a subscription.
34
+
35
+ ## Pricing tiers (indicative — final pricing per contract)
36
+
37
+ | Tier | Use case | Indicative price |
38
+ |-------------|-------------------------------------------|------------------|
39
+ | Free | Personal / OSS / educational / pilots | $0 |
40
+ | Starter | Small commercial projects, < $1M ARR | $29 / month |
41
+ | Team | Mid-size SaaS or internal corp deployment | $199 / month |
42
+ | Enterprise | Large-scale production, custom rules, SLA | $1,990 / month + |
43
+ | Strategic | OEM, embedding in another guardrails suite | Negotiated |
44
+
45
+ Numbers are placeholders for the launch period. Final pricing depends
46
+ on volume, support requirements, and any custom-rule co-development.
47
+
48
+ ## What you get with a paid subscription
49
+
50
+ - Permission to use `pantheon-guard` per the use cases above
51
+ - Email support, response within 2 business days
52
+ - Priority on new rules, additional language patterns, and benchmark
53
+ data releases
54
+ - Optional consulting on custom rule sets and integration patterns
55
+ - A signed commercial license document for procurement
56
+
57
+ ## Contact
58
+
59
+ For commercial subscriptions, custom rule development, or integration
60
+ support: see the email address listed in `package.json` under `author`.
61
+
62
+ ## Why dual-license
63
+
64
+ The MIT license keeps the package usable for OSS projects, individual
65
+ developers, researchers, and pilot evaluations — including by the major
66
+ guardrails suites whose users we want to reach. The commercial addendum
67
+ funds continued development of the deterministic rule layer and the
68
+ v0.2 classifier without venture-capital pressure to compromise the
69
+ deterministic nature of the core.
70
+
71
+ ## Note on this template
72
+
73
+ This document is **not legal advice** and is not a complete commercial
74
+ agreement. It signals the intent and shape of the commercial terms.
75
+ The final document delivered with a paid subscription is reviewed by
76
+ counsel and may differ in specifics.
package/LICENSE-MIT.md ADDED
@@ -0,0 +1,21 @@
1
+ # MIT License
2
+
3
+ Copyright (c) 2026 Aleksandr Baryshnikov / Pantheon
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.