npm - patina-cli - Versions diffs - 3.11.0 → 4.0.0 - Mend

patina-cli 3.11.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (193) hide show

package/.patina.default.yaml +29 -29
package/CHANGELOG.md +53 -0
package/NOTICE +21 -0
package/README.md +117 -224
package/README_JA.md +134 -77
package/README_KR.md +132 -74
package/README_ZH.md +137 -80
package/SKILL.md +11 -20
package/artifacts/rebaseline-2025/README.md +147 -0
package/artifacts/rebaseline-2025/human-controls.public.jsonl +250 -0
package/artifacts/rebaseline-2025/intake.example.jsonl +2 -0
package/artifacts/rebaseline-2025/intake.local.example.jsonl +25 -0
package/artifacts/rebaseline-2025/prompts.template.jsonl +7 -0
package/artifacts/rebaseline-2025/sources.ko-public.jsonl +39 -0
package/assets/brand/patina-badge.svg +18 -0
package/assets/brand/patina-mark.svg +8 -0
package/assets/demo/README.md +79 -0
package/core/scoring.md +12 -12
package/core/standalone-prompt.md +3 -1
package/core/stylometry.md +93 -22
package/docs/API.md +1554 -0
package/docs/AUTHENTICATION.md +50 -26
package/docs/AUTHENTICATION_KR.md +54 -29
package/docs/BRANDING.md +9 -8
package/docs/CLI.md +55 -14
package/docs/COOKBOOK.md +8 -21
package/docs/DEMO.md +32 -5
package/docs/EXIT-CODES.md +2 -3
package/docs/FALSE-POSITIVES.md +63 -0
package/docs/FAQ.md +9 -1
package/docs/FAQ_KR.md +3 -1
package/docs/FLAG-PARITY.md +33 -47
package/docs/ISSUE-WAVES.md +57 -0
package/docs/PATTERNS-EN.md +67 -3
package/docs/PATTERNS-JA.md +68 -2
package/docs/PATTERNS-KO.md +70 -7
package/docs/PATTERNS-ZH.md +67 -3
package/docs/PATTERNS.md +5 -5
package/docs/RESEARCH-DOCS-PLATFORM.md +54 -0
package/docs/ROADMAP.md +46 -66
package/docs/TRANSLATIONESE-KO.md +51 -0
package/docs/audits/2026-05-deep-research.md +3 -1
package/docs/benchmarks/README.md +51 -0
package/docs/benchmarks/detector-comparison.json +69 -9
package/docs/benchmarks/detector-comparison.md +10 -5
package/docs/benchmarks/katfish-ko-latest.json +657 -0
package/docs/benchmarks/katfish-ko-latest.md +77 -0
package/docs/benchmarks/latest.json +1183 -108
package/docs/benchmarks/latest.md +84 -60
package/docs/benchmarks/lexicon-freshness-en-2026-05-22.json +1121 -0
package/docs/benchmarks/lexicon-freshness-en-2026-05-22.md +136 -0
package/docs/benchmarks/rebaseline-latest.json +381 -0
package/docs/benchmarks/rebaseline-latest.md +121 -0
package/docs/benchmarks/register-stratified-latest.json +164 -0
package/docs/benchmarks/register-stratified-latest.md +99 -0
package/docs/benchmarks/register-stratified.md +43 -0
package/docs/integrations/github-action.md +44 -11
package/docs/integrations/playground.md +58 -0
package/docs/integrations/pre-commit.md +5 -5
package/docs/integrations/release.md +5 -3
package/docs/integrations/static-sites.md +83 -0
package/docs/research/2025-rebaseline-plan.md +71 -2
package/docs/research/2026-rebaseline.md +102 -0
package/docs/research/adversarial-mps.md +41 -0
package/docs/research/ai-human-metrics.md +35 -23
package/docs/research/human-eval-panel.md +42 -0
package/docs/research/judge-agreement.md +24 -0
package/docs/research/ko-2025-corpus-sources.md +135 -0
package/docs/research/lexicon-freshness-audit.md +64 -0
package/docs/research/zh-ja-lexicon-calibration.md +60 -0
package/docs/social/patina-launch-copy.md +173 -100
package/docs/social/patina-launch-execution.md +94 -0
package/docs/social/patina-launch-korean-first.md +83 -0
package/docs/social/signs-of-ai-writing.md +26 -0
package/docs/social/signs-of-ai-writing_KR.md +26 -0
package/lexicon/ai-en.md +21 -24
package/lexicon/ai-ja.md +158 -0
package/lexicon/ai-ko.md +9 -9
package/lexicon/ai-zh.md +158 -0
package/lexicon/provenance/ai-en.json +970 -0
package/lexicon/provenance/ai-ja.json +542 -0
package/lexicon/provenance/ai-ko.json +866 -0
package/lexicon/provenance/ai-zh.json +542 -0
package/package.json +49 -8
package/patterns/en-communication.md +5 -0
package/patterns/en-content.md +5 -0
package/patterns/en-filler.md +5 -0
package/patterns/en-language.md +29 -1
package/patterns/en-structure.md +5 -0
package/patterns/en-style.md +5 -0
package/patterns/en-viral-hook.md +42 -2
package/patterns/ja-communication.md +5 -0
package/patterns/ja-content.md +5 -0
package/patterns/ja-filler.md +5 -0
package/patterns/ja-language.md +33 -1
package/patterns/ja-structure.md +12 -0
package/patterns/ja-style.md +5 -0
package/patterns/ja-viral-hook.md +41 -2
package/patterns/ko-communication.md +5 -0
package/patterns/ko-content.md +5 -0
package/patterns/ko-filler.md +5 -0
package/patterns/ko-language.md +33 -1
package/patterns/ko-structure.md +25 -6
package/patterns/ko-style.md +5 -0
package/patterns/ko-viral-hook.md +38 -2
package/patterns/zh-communication.md +5 -0
package/patterns/zh-content.md +5 -0
package/patterns/zh-filler.md +5 -0
package/patterns/zh-language.md +37 -1
package/patterns/zh-structure.md +12 -0
package/patterns/zh-style.md +5 -0
package/patterns/zh-viral-hook.md +38 -2
package/playground/README.md +55 -0
package/playground/analytics.js +4 -0
package/playground/analyzer.js +883 -0
package/playground/app.js +157 -0
package/playground/data/lexicons.js +343 -0
package/playground/index.html +138 -0
package/playground/styles.css +267 -0
package/profiles/namuwiki.md +111 -0
package/scripts/adversarial-mps-report.mjs +201 -0
package/scripts/badge-json.mjs +79 -0
package/scripts/benchmark-report.mjs +56 -9
package/scripts/check-release-metadata.mjs +0 -2
package/scripts/detector-comparison.mjs +7 -7
package/scripts/generate-playground-data.mjs +77 -0
package/scripts/katfish-calibration.mjs +464 -0
package/scripts/lexicon-freshness.mjs +485 -0
package/scripts/lint.mjs +1 -1
package/scripts/precommit-score.mjs +4 -3
package/scripts/prose-score.mjs +81 -5
package/scripts/rebaseline-intake.mjs +242 -0
package/scripts/rebaseline-score.mjs +268 -0
package/scripts/rebaseline-summary.mjs +773 -0
package/scripts/rebaseline-web-collect.mjs +410 -0
package/scripts/update-benchmark-ranges.mjs +1 -0
package/src/api.js +69 -105
package/src/auth.js +50 -2
package/src/backends/claude-cli.js +19 -4
package/src/backends/codex-cli.js +19 -3
package/src/backends/contract.js +230 -1
package/src/backends/gemini-cli.js +18 -5
package/src/backends/index.js +87 -12
package/src/backends/kimi-cli.js +161 -0
package/src/cli.js +577 -567
package/src/commands/doctor.js +2 -2
package/src/config.js +29 -0
package/src/errors.js +53 -1
package/src/features/discourse-tells.js +68 -0
package/src/features/index.js +82 -8
package/src/features/lexicon.js +40 -6
package/src/features/markup-leakage.js +69 -0
package/src/features/segment.js +41 -0
package/src/features/signal-strength.js +81 -0
package/src/features/stylometry.js +231 -1
package/src/features/translationese.js +127 -0
package/src/loader.js +76 -0
package/src/logger.js +22 -23
package/src/model-defaults.js +55 -0
package/src/ouroboros.js +31 -0
package/src/output.js +102 -90
package/src/prompt-builder.js +103 -68
package/src/providers.js +51 -4
package/src/scoring.js +210 -2
package/src/security.js +75 -0
package/tests/fixtures/live-quality/en/public-docs-01.md +26 -0
package/tests/fixtures/live-quality/ko/public-docs-01.md +26 -0
package/tests/fixtures/suspect-zones/expected-ranges.json +207 -16
package/tests/fixtures/suspect-zones/ja/ai/ja-ai-04-lexicon.md +11 -0
package/tests/fixtures/suspect-zones/ja/natural/ja-nat-04-lexicon-cold.md +11 -0
package/tests/fixtures/suspect-zones/ko/ai/ko-ai-02.md +4 -5
package/tests/fixtures/suspect-zones/ko/ai/ko-ai-07-ko-diagnostic.md +11 -0
package/tests/fixtures/suspect-zones/zh/ai/zh-ai-04-lexicon.md +11 -0
package/tests/fixtures/suspect-zones/zh/natural/zh-nat-04-lexicon-cold.md +11 -0
package/tests/quality/README.md +188 -11
package/tests/quality/adversarial-mps/fixtures.jsonl +10 -0
package/tests/quality/benchmark.mjs +39 -1
package/tests/quality/dogfood.mjs +5 -3
package/tests/quality/live-fixtures.jsonl +2 -0
package/tests/quality/live-quality.mjs +596 -0
package/tests/quality/ranking-metrics.mjs +136 -0
package/tests/quality/rebaseline-manifest.example.jsonl +5 -0
package/vercel.json +53 -0
package/SKILL-MAX.md +0 -455
package/docs/internal/HARNESS.md +0 -14
package/docs/internal/README.md +0 -14
package/docs/internal/WARP.md +0 -23
package/patina-max/SKILL.md +0 -523
package/patina-max/composite.py +0 -457
package/src/cache.js +0 -106
package/src/commands/init.js +0 -208
package/src/manifest.js +0 -162
package/src/max-mode.js +0 -207

package/docs/benchmarks/lexicon-freshness-en-2026-05-22.md ADDED Viewed

@@ -0,0 +1,136 @@
+# Lexicon Freshness Lift Report
+- Language: en
+- Source: hape-en-gpt4o-vs-human-2026-05-22
+- Validated at: 2026-05-22
+- Input: artifacts/rebaseline-2025/private/hape-en.private.jsonl
+- Entries evaluated: 108
+- Decision summary: 88 keep / 20 drop
+- Gate: **PASS** (8290 hot docs, 8290 cold docs)
+- Source note: HAP-E MIT English paired corpus: GPT-4o 2024-08-06 continuations vs human chunk_2; raw text kept local/private, aggregate only committed.
+## Source provenance
+- <https://huggingface.co/datasets/browndw/human-ai-parallel-corpus>
+- <https://cmustatistics.github.io/data-repository/language/hap-e.html>
+- Public report policy: aggregate counts only; raw corpus rows stay local/private.
+## Register coverage
+| class | registers |
+|---|---|
+| hot | acad=1227, blog=1526, fic=1395, news=1322, spok=1721, tvm=1099 |
+| cold | acad=1227, blog=1526, fic=1395, news=1322, spok=1721, tvm=1099 |
+## Entry decisions
+| decision | kind | entry | hot docs | cold docs | lift | cold rate |
+|---|---|---|---:|---:|---:|---:|
+| drop | phrase | a host of | 9 | 14 | 0.64 | 0.17% |
+| drop | phrase | a wide range of | 26 | 47 | 0.55 | 0.57% |
+| drop | phrase | close the gap | 1 | 1 | 1 | 0.01% |
+| drop | phrase | driving force | 25 | 7 | 3.57 | 0.08% |
+| drop | phrase | end-to-end | 1 | 3 | 0.33 | 0.04% |
+| drop | phrase | gain a deeper understanding | 0 | 0 | 0 | 0.00% |
+| drop | phrase | in the age of | 6 | 4 | 1.5 | 0.05% |
+| drop | phrase | it is essential to | 32 | 13 | 2.46 | 0.16% |
+| drop | phrase | key drivers | 1 | 3 | 0.33 | 0.04% |
+| drop | phrase | on the other hand | 157 | 158 | 0.99 | 1.91% |
+| drop | phrase | play a key role | 1 | 5 | 0.2 | 0.06% |
+| drop | phrase | to ensure that | 140 | 43 | 3.26 | 0.52% |
+| drop | phrase | under the hood | 3 | 2 | 1.5 | 0.02% |
+| drop | strict | dimensions | 179 | 46 | 3.89 | 0.55% |
+| drop | strict | elevated | 74 | 61 | 1.21 | 0.74% |
+| drop | strict | enable | 140 | 82 | 1.71 | 0.99% |
+| drop | strict | framework | 380 | 129 | 2.95 | 1.56% |
+| drop | strict | state-of-the-art | 36 | 24 | 1.5 | 0.29% |
+| drop | strict | unleash | 45 | 13 | 3.46 | 0.16% |
+| drop | strict | workflow | 11 | 8 | 1.38 | 0.10% |
+| keep | phrase | a deeper dive | 7 | 0 | Infinity | 0.00% |
+| keep | phrase | a myriad of | 67 | 3 | 22.33 | 0.04% |
+| keep | phrase | a new chapter | 138 | 0 | Infinity | 0.00% |
+| keep | phrase | a new era | 130 | 9 | 14.44 | 0.11% |
+| keep | phrase | a new frontier | 8 | 0 | Infinity | 0.00% |
+| keep | phrase | a plethora of | 26 | 5 | 5.2 | 0.06% |
+| keep | phrase | a robust framework | 28 | 0 | Infinity | 0.00% |
+| keep | phrase | a wide array of | 13 | 1 | 13 | 0.01% |
+| keep | phrase | at its core | 48 | 2 | 24 | 0.02% |
+| keep | phrase | at the forefront | 95 | 2 | 47.5 | 0.02% |
+| keep | phrase | at the heart of | 143 | 18 | 7.94 | 0.22% |
+| keep | phrase | best practices | 51 | 6 | 8.5 | 0.07% |
+| keep | phrase | bridge the gap | 94 | 3 | 31.33 | 0.04% |
+| keep | phrase | comprehensive approach | 33 | 1 | 33 | 0.01% |
+| keep | phrase | continuous improvement | 28 | 1 | 28 | 0.01% |
+| keep | phrase | ever-changing | 74 | 0 | Infinity | 0.00% |
+| keep | phrase | ever-evolving | 144 | 0 | Infinity | 0.00% |
+| keep | phrase | fast-paced | 53 | 3 | 17.67 | 0.04% |
+| keep | phrase | gain valuable insights | 2 | 0 | Infinity | 0.00% |
+| keep | phrase | glean insights | 3 | 0 | Infinity | 0.00% |
+| keep | phrase | harness the power | 8 | 0 | Infinity | 0.00% |
+| keep | phrase | holistic approach | 128 | 4 | 32 | 0.05% |
+| keep | phrase | in the digital age | 23 | 0 | Infinity | 0.00% |
+| keep | phrase | in the modern era | 7 | 1 | 7 | 0.01% |
+| keep | phrase | in today's | 69 | 17 | 4.06 | 0.21% |
+| keep | phrase | key insights | 4 | 0 | Infinity | 0.00% |
+| keep | phrase | key takeaways | 2 | 0 | Infinity | 0.00% |
+| keep | phrase | pave the path | 4 | 0 | Infinity | 0.00% |
+| keep | phrase | pave the way | 133 | 1 | 133 | 0.01% |
+| keep | phrase | play a crucial role | 75 | 4 | 18.75 | 0.05% |
+| keep | phrase | plays a vital role | 11 | 1 | 11 | 0.01% |
+| keep | phrase | rapidly changing | 42 | 1 | 42 | 0.01% |
+| keep | phrase | rapidly evolving | 32 | 1 | 32 | 0.01% |
+| keep | phrase | realize the potential | 3 | 0 | Infinity | 0.00% |
+| keep | phrase | the bigger picture | 22 | 2 | 11 | 0.02% |
+| keep | phrase | the competitive landscape | 1 | 0 | Infinity | 0.00% |
+| keep | phrase | the digital landscape | 13 | 0 | Infinity | 0.00% |
+| keep | phrase | the future of | 212 | 24 | 8.83 | 0.29% |
+| keep | phrase | the landscape of | 134 | 1 | 134 | 0.01% |
+| keep | phrase | the realm of | 224 | 7 | 32 | 0.08% |
+| keep | phrase | the regulatory landscape | 4 | 0 | Infinity | 0.00% |
+| keep | phrase | the world of | 241 | 42 | 5.74 | 0.51% |
+| keep | phrase | unlock the potential | 6 | 0 | Infinity | 0.00% |
+| keep | phrase | usher in | 37 | 6 | 6.17 | 0.07% |
+| keep | phrase | valuable insights | 124 | 3 | 41.33 | 0.04% |
+| keep | strict | accelerate | 69 | 17 | 4.06 | 0.21% |
+| keep | strict | actionable | 104 | 3 | 34.67 | 0.04% |
+| keep | strict | align | 370 | 17 | 21.76 | 0.21% |
+| keep | strict | alignment | 135 | 23 | 5.87 | 0.28% |
+| keep | strict | amplify | 117 | 5 | 23.4 | 0.06% |
+| keep | strict | bespoke | 69 | 8 | 8.63 | 0.10% |
+| keep | strict | bolster | 175 | 12 | 14.58 | 0.14% |
+| keep | strict | catalyst | 161 | 26 | 6.19 | 0.31% |
+| keep | strict | compelling | 340 | 27 | 12.59 | 0.33% |
+| keep | strict | curated | 106 | 7 | 15.14 | 0.08% |
+| keep | strict | cutting-edge | 165 | 1 | 165 | 0.01% |
+| keep | strict | dynamic | 765 | 110 | 6.95 | 1.33% |
+| keep | strict | ecosystem | 205 | 48 | 4.27 | 0.58% |
+| keep | strict | elevate | 106 | 4 | 26.5 | 0.05% |
+| keep | strict | empower | 142 | 4 | 35.5 | 0.05% |
+| keep | strict | empowering | 166 | 7 | 23.71 | 0.08% |
+| keep | strict | enabling | 263 | 39 | 6.74 | 0.47% |
+| keep | strict | envision | 117 | 9 | 13 | 0.11% |
+| keep | strict | ethical | 259 | 25 | 10.36 | 0.30% |
+| keep | strict | harness | 218 | 14 | 15.57 | 0.17% |
+| keep | strict | impactful | 83 | 3 | 27.67 | 0.04% |
+| keep | strict | inclusive | 205 | 18 | 11.39 | 0.22% |
+| keep | strict | inflection | 12 | 0 | Infinity | 0.00% |
+| keep | strict | meaningful | 305 | 48 | 6.35 | 0.58% |
+| keep | strict | modalities | 61 | 15 | 4.07 | 0.18% |
+| keep | strict | pivot | 184 | 5 | 36.8 | 0.06% |
+| keep | strict | prioritize | 239 | 3 | 79.67 | 0.04% |
+| keep | strict | reimagine | 22 | 0 | Infinity | 0.00% |
+| keep | strict | rethink | 45 | 3 | 15 | 0.04% |
+| keep | strict | scalable | 34 | 5 | 6.8 | 0.06% |
+| keep | strict | seamless | 176 | 4 | 44 | 0.05% |
+| keep | strict | seamlessly | 352 | 9 | 39.11 | 0.11% |
+| keep | strict | skillset | 4 | 0 | Infinity | 0.00% |
+| keep | strict | streamline | 42 | 3 | 14 | 0.04% |
+| keep | strict | streamlined | 26 | 3 | 8.67 | 0.04% |
+| keep | strict | sustainable | 690 | 67 | 10.3 | 0.81% |
+| keep | strict | thoughtful | 228 | 33 | 6.91 | 0.40% |
+| keep | strict | thrive | 279 | 18 | 15.5 | 0.22% |
+| keep | strict | thriving | 137 | 6 | 22.83 | 0.07% |
+| keep | strict | toolkit | 39 | 3 | 13 | 0.04% |
+| keep | strict | transformative | 417 | 5 | 83.4 | 0.06% |
+| keep | strict | unlock | 165 | 15 | 11 | 0.18% |
+| keep | strict | vibrant | 989 | 13 | 76.08 | 0.16% |

package/docs/benchmarks/rebaseline-latest.json ADDED Viewed

@@ -0,0 +1,381 @@
+{
+  "schemaVersion": 1,
+  "generatedAt": "2026-05-21T18:13:21.576Z",
+  "input": "artifacts/rebaseline-2025/rebaseline-2026.scored.public.jsonl",
+  "targets": {
+    "protocolPerLanguageClassRegister": 25,
+    "claimPerCell": 100,
+    "claimLanguages": 2,
+    "claimGeneratorFamilies": 3
+  },
+  "totalRecords": 800,
+  "byLanguage": {
+    "ko": 400,
+    "en": 400
+  },
+  "byClass": {
+    "natural-human": 200,
+    "ai-like": 600
+  },
+  "byRegister": {
+    "product-doc": 140,
+    "academic-summary": 190,
+    "chat-update": 140,
+    "blog": 190,
+    "technical-how-to": 140
+  },
+  "byModelFamily": {
+    "human-reference": 200,
+    "claude-family": 200,
+    "gemini-family": 200,
+    "gpt-family": 200
+  },
+  "protocolCoverage": {
+    "totalCells": 80,
+    "populatedCells": 17,
+    "emptyCells": 63,
+    "cellsMeetingTarget": 12,
+    "underfilledCells": [
+      {
+        "key": "ko|natural-human|blog",
+        "count": 20
+      },
+      {
+        "key": "ko|natural-human|academic-summary",
+        "count": 20
+      },
+      {
+        "key": "ko|natural-human|product-doc",
+        "count": 20
+      },
+      {
+        "key": "ko|natural-human|chat-update",
+        "count": 20
+      },
+      {
+        "key": "ko|natural-human|technical-how-to",
+        "count": 20
+      }
+    ]
+  },
+  "claimGate": {
+    "ready": true,
+    "blockers": [],
+    "qualifiedPositiveCells": [
+      {
+        "key": "en|claude-family",
+        "count": 100
+      },
+      {
+        "key": "en|gemini-family",
+        "count": 100
+      },
+      {
+        "key": "en|gpt-family",
+        "count": 100
+      },
+      {
+        "key": "ko|claude-family",
+        "count": 100
+      },
+      {
+        "key": "ko|gemini-family",
+        "count": 100
+      },
+      {
+        "key": "ko|gpt-family",
+        "count": 100
+      }
+    ],
+    "qualifiedNaturalCells": [
+      {
+        "key": "ko",
+        "count": 100
+      },
+      {
+        "key": "en",
+        "count": 100
+      }
+    ]
+  },
+  "metrics": {
+    "tp": 404,
+    "fp": 32,
+    "fn": 196,
+    "tn": 168,
+    "total": 800,
+    "accuracy": 0.715,
+    "precision": 0.927,
+    "recall": 0.673,
+    "f1": 0.78,
+    "falsePositiveRate": 0.16,
+    "falseNegativeRate": 0.327,
+    "accuracyCi": {
+      "low": 0.683,
+      "high": 0.745,
+      "method": "Wilson score interval, 95%"
+    },
+    "recallCi": {
+      "low": 0.635,
+      "high": 0.71,
+      "method": "Wilson score interval, 95%"
+    },
+    "falsePositiveRateCi": {
+      "low": 0.116,
+      "high": 0.217,
+      "method": "Wilson score interval, 95%"
+    }
+  },
+  "catchByLanguageFamily": {
+    "en|claude-family": {
+      "language": "en",
+      "modelFamily": "claude-family",
+      "n": 100,
+      "caught": 74,
+      "missed": 26,
+      "catchRate": 0.74,
+      "catchRateCi": {
+        "low": 0.646,
+        "high": 0.816,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "en|gemini-family": {
+      "language": "en",
+      "modelFamily": "gemini-family",
+      "n": 100,
+      "caught": 79,
+      "missed": 21,
+      "catchRate": 0.79,
+      "catchRateCi": {
+        "low": 0.7,
+        "high": 0.858,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "en|gpt-family": {
+      "language": "en",
+      "modelFamily": "gpt-family",
+      "n": 100,
+      "caught": 77,
+      "missed": 23,
+      "catchRate": 0.77,
+      "catchRateCi": {
+        "low": 0.678,
+        "high": 0.842,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "ko|claude-family": {
+      "language": "ko",
+      "modelFamily": "claude-family",
+      "n": 100,
+      "caught": 68,
+      "missed": 32,
+      "catchRate": 0.68,
+      "catchRateCi": {
+        "low": 0.583,
+        "high": 0.763,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "ko|gemini-family": {
+      "language": "ko",
+      "modelFamily": "gemini-family",
+      "n": 100,
+      "caught": 62,
+      "missed": 38,
+      "catchRate": 0.62,
+      "catchRateCi": {
+        "low": 0.522,
+        "high": 0.709,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "ko|gpt-family": {
+      "language": "ko",
+      "modelFamily": "gpt-family",
+      "n": 100,
+      "caught": 44,
+      "missed": 56,
+      "catchRate": 0.44,
+      "catchRateCi": {
+        "low": 0.347,
+        "high": 0.538,
+        "method": "Wilson score interval, 95%"
+      }
+    }
+  },
+  "falsePositiveByLanguage": {
+    "en": {
+      "language": "en",
+      "n": 100,
+      "falsePositives": 14,
+      "trueNegatives": 86,
+      "falsePositiveRate": 0.14,
+      "falsePositiveRateCi": {
+        "low": 0.085,
+        "high": 0.221,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "ko": {
+      "language": "ko",
+      "n": 100,
+      "falsePositives": 18,
+      "trueNegatives": 82,
+      "falsePositiveRate": 0.18,
+      "falsePositiveRateCi": {
+        "low": 0.117,
+        "high": 0.267,
+        "method": "Wilson score interval, 95%"
+      }
+    }
+  },
+  "metricsByRegister": {
+    "academic-summary": {
+      "tp": 89,
+      "fp": 18,
+      "fn": 31,
+      "tn": 52,
+      "total": 190,
+      "accuracy": 0.742,
+      "precision": 0.832,
+      "recall": 0.742,
+      "f1": 0.784,
+      "falsePositiveRate": 0.257,
+      "falseNegativeRate": 0.258,
+      "accuracyCi": {
+        "low": 0.676,
+        "high": 0.799,
+        "method": "Wilson score interval, 95%"
+      },
+      "recallCi": {
+        "low": 0.657,
+        "high": 0.812,
+        "method": "Wilson score interval, 95%"
+      },
+      "falsePositiveRateCi": {
+        "low": 0.169,
+        "high": 0.37,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "blog": {
+      "tp": 70,
+      "fp": 6,
+      "fn": 50,
+      "tn": 64,
+      "total": 190,
+      "accuracy": 0.705,
+      "precision": 0.921,
+      "recall": 0.583,
+      "f1": 0.714,
+      "falsePositiveRate": 0.086,
+      "falseNegativeRate": 0.417,
+      "accuracyCi": {
+        "low": 0.637,
+        "high": 0.766,
+        "method": "Wilson score interval, 95%"
+      },
+      "recallCi": {
+        "low": 0.494,
+        "high": 0.668,
+        "method": "Wilson score interval, 95%"
+      },
+      "falsePositiveRateCi": {
+        "low": 0.04,
+        "high": 0.175,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "chat-update": {
+      "tp": 61,
+      "fp": 0,
+      "fn": 59,
+      "tn": 20,
+      "total": 140,
+      "accuracy": 0.579,
+      "precision": 1,
+      "recall": 0.508,
+      "f1": 0.674,
+      "falsePositiveRate": 0,
+      "falseNegativeRate": 0.492,
+      "accuracyCi": {
+        "low": 0.496,
+        "high": 0.657,
+        "method": "Wilson score interval, 95%"
+      },
+      "recallCi": {
+        "low": 0.42,
+        "high": 0.596,
+        "method": "Wilson score interval, 95%"
+      },
+      "falsePositiveRateCi": {
+        "low": 0,
+        "high": 0.161,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "product-doc": {
+      "tp": 94,
+      "fp": 2,
+      "fn": 26,
+      "tn": 18,
+      "total": 140,
+      "accuracy": 0.8,
+      "precision": 0.979,
+      "recall": 0.783,
+      "f1": 0.87,
+      "falsePositiveRate": 0.1,
+      "falseNegativeRate": 0.217,
+      "accuracyCi": {
+        "low": 0.726,
+        "high": 0.858,
+        "method": "Wilson score interval, 95%"
+      },
+      "recallCi": {
+        "low": 0.701,
+        "high": 0.848,
+        "method": "Wilson score interval, 95%"
+      },
+      "falsePositiveRateCi": {
+        "low": 0.028,
+        "high": 0.301,
+        "method": "Wilson score interval, 95%"
+      }
+    },
+    "technical-how-to": {
+      "tp": 90,
+      "fp": 6,
+      "fn": 30,
+      "tn": 14,
+      "total": 140,
+      "accuracy": 0.743,
+      "precision": 0.938,
+      "recall": 0.75,
+      "f1": 0.833,
+      "falsePositiveRate": 0.3,
+      "falseNegativeRate": 0.25,
+      "accuracyCi": {
+        "low": 0.665,
+        "high": 0.808,
+        "method": "Wilson score interval, 95%"
+      },
+      "recallCi": {
+        "low": 0.666,
+        "high": 0.819,
+        "method": "Wilson score interval, 95%"
+      },
+      "falsePositiveRateCi": {
+        "low": 0.145,
+        "high": 0.519,
+        "method": "Wilson score interval, 95%"
+      }
+    }
+  },
+  "validation": {
+    "errors": [],
+    "warnings": []
+  }
+}

package/docs/benchmarks/rebaseline-latest.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Rebaseline Manifest Summary
+- Generated at: 2026-05-21T18:13:21.576Z
+- Input: `artifacts/rebaseline-2025/rebaseline-2026.scored.public.jsonl`
+- Records: 800
+- Protocol target: 25 samples per language × class × register cell
+- Public claim target: 100 samples per claim cell, 2+ languages, 3+ generator families
+## Validation
+Validation: **PASS**
+## Coverage snapshot
+### By language
+| value | n |
+|---|---:|
+| ko | 400 |
+| en | 400 |
+| zh | 0 |
+| ja | 0 |
+### By class
+| value | n |
+|---|---:|
+| ai-like | 600 |
+| natural-human | 200 |
+| lightly-edited-ai | 0 |
+| heavily-edited-ai | 0 |
+### By register
+| value | n |
+|---|---:|
+| blog | 190 |
+| academic-summary | 190 |
+| product-doc | 140 |
+| chat-update | 140 |
+| technical-how-to | 140 |
+### By model family
+| value | n |
+|---|---:|
+| gpt-family | 200 |
+| claude-family | 200 |
+| gemini-family | 200 |
+| open-weight | 0 |
+| human-reference | 200 |
+## Protocol matrix
+- Populated language × class × register cells: 17/80
+- Cells meeting 25+ samples: 12
+- Empty cells: 63
+- Underfilled populated cells: 5
+| cell | n |
+|---|---:|
+| ko × natural-human × blog | 20 |
+| ko × natural-human × academic-summary | 20 |
+| ko × natural-human × product-doc | 20 |
+| ko × natural-human × chat-update | 20 |
+| ko × natural-human × technical-how-to | 20 |
+## Public performance claim gate
+Public performance claim: **READY**
+Gate conditions met by this manifest.
+| claim-gate count | value |
+|---|---:|
+| qualified positive cells (language × generator family, n≥100) | 6 |
+| qualified natural-language cells (language, n≥100) | 2 |
+| outcome rows with expected/predicted labels | 800 |
+## Outcome metrics
+| metric | value |
+|---|---:|
+| accuracy | 71.5% |
+| accuracy CI | 68.3%–74.5% |
+| precision | 92.7% |
+| recall | 67.3% |
+| recall CI | 63.5%–71.0% |
+| F1 | 0.780 |
+| false positive rate | 16.0% |
+| false positive rate CI | 11.6%–21.7% |
+| false negative rate | 32.7% |
+| TP/FP/FN/TN | 404/32/196/168 |
+### Catch rate by language × model family
+| language | model family | n | catch rate | 95% CI | caught/missed |
+|---|---|---:|---:|---:|---:|
+| en | claude-family | 100 | 74.0% | 64.6%–81.6% | 74/26 |
+| en | gemini-family | 100 | 79.0% | 70.0%–85.8% | 79/21 |
+| en | gpt-family | 100 | 77.0% | 67.8%–84.2% | 77/23 |
+| ko | claude-family | 100 | 68.0% | 58.3%–76.3% | 68/32 |
+| ko | gemini-family | 100 | 62.0% | 52.2%–70.9% | 62/38 |
+| ko | gpt-family | 100 | 44.0% | 34.7%–53.8% | 44/56 |
+### False-positive rate by language
+| language | n | false-positive rate | 95% CI | FP/TN |
+|---|---:|---:|---:|---:|
+| en | 100 | 14.0% | 8.5%–22.1% | 14/86 |
+| ko | 100 | 18.0% | 11.7%–26.7% | 18/82 |
+### By register
+| register | n | FP rate | FN rate | TP/FP/FN/TN |
+|---|---:|---:|---:|---:|
+| blog | 190 | 8.6% | 41.7% | 70/6/50/64 |
+| academic-summary | 190 | 25.7% | 25.8% | 89/18/31/52 |
+| product-doc | 140 | 10.0% | 21.7% | 94/2/26/18 |
+| chat-update | 140 | 0.0% | 49.2% | 61/0/59/20 |
+| technical-how-to | 140 | 30.0% | 25.0% | 90/6/30/14 |