patina-cli 3.11.0 → 4.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.patina.default.yaml +29 -29
- package/CHANGELOG.md +53 -0
- package/NOTICE +21 -0
- package/README.md +117 -224
- package/README_JA.md +134 -77
- package/README_KR.md +132 -74
- package/README_ZH.md +137 -80
- package/SKILL.md +11 -20
- package/artifacts/rebaseline-2025/README.md +147 -0
- package/artifacts/rebaseline-2025/human-controls.public.jsonl +250 -0
- package/artifacts/rebaseline-2025/intake.example.jsonl +2 -0
- package/artifacts/rebaseline-2025/intake.local.example.jsonl +25 -0
- package/artifacts/rebaseline-2025/prompts.template.jsonl +7 -0
- package/artifacts/rebaseline-2025/sources.ko-public.jsonl +39 -0
- package/assets/brand/patina-badge.svg +18 -0
- package/assets/brand/patina-mark.svg +8 -0
- package/assets/demo/README.md +79 -0
- package/core/scoring.md +12 -12
- package/core/standalone-prompt.md +3 -1
- package/core/stylometry.md +93 -22
- package/docs/API.md +1554 -0
- package/docs/AUTHENTICATION.md +50 -26
- package/docs/AUTHENTICATION_KR.md +54 -29
- package/docs/BRANDING.md +9 -8
- package/docs/CLI.md +55 -14
- package/docs/COOKBOOK.md +8 -21
- package/docs/DEMO.md +32 -5
- package/docs/EXIT-CODES.md +2 -3
- package/docs/FALSE-POSITIVES.md +63 -0
- package/docs/FAQ.md +9 -1
- package/docs/FAQ_KR.md +3 -1
- package/docs/FLAG-PARITY.md +33 -47
- package/docs/ISSUE-WAVES.md +57 -0
- package/docs/PATTERNS-EN.md +67 -3
- package/docs/PATTERNS-JA.md +68 -2
- package/docs/PATTERNS-KO.md +70 -7
- package/docs/PATTERNS-ZH.md +67 -3
- package/docs/PATTERNS.md +5 -5
- package/docs/RESEARCH-DOCS-PLATFORM.md +54 -0
- package/docs/ROADMAP.md +46 -66
- package/docs/TRANSLATIONESE-KO.md +51 -0
- package/docs/audits/2026-05-deep-research.md +3 -1
- package/docs/benchmarks/README.md +51 -0
- package/docs/benchmarks/detector-comparison.json +69 -9
- package/docs/benchmarks/detector-comparison.md +10 -5
- package/docs/benchmarks/katfish-ko-latest.json +657 -0
- package/docs/benchmarks/katfish-ko-latest.md +77 -0
- package/docs/benchmarks/latest.json +1183 -108
- package/docs/benchmarks/latest.md +84 -60
- package/docs/benchmarks/lexicon-freshness-en-2026-05-22.json +1121 -0
- package/docs/benchmarks/lexicon-freshness-en-2026-05-22.md +136 -0
- package/docs/benchmarks/rebaseline-latest.json +381 -0
- package/docs/benchmarks/rebaseline-latest.md +121 -0
- package/docs/benchmarks/register-stratified-latest.json +164 -0
- package/docs/benchmarks/register-stratified-latest.md +99 -0
- package/docs/benchmarks/register-stratified.md +43 -0
- package/docs/integrations/github-action.md +44 -11
- package/docs/integrations/playground.md +58 -0
- package/docs/integrations/pre-commit.md +5 -5
- package/docs/integrations/release.md +5 -3
- package/docs/integrations/static-sites.md +83 -0
- package/docs/research/2025-rebaseline-plan.md +71 -2
- package/docs/research/2026-rebaseline.md +102 -0
- package/docs/research/adversarial-mps.md +41 -0
- package/docs/research/ai-human-metrics.md +35 -23
- package/docs/research/human-eval-panel.md +42 -0
- package/docs/research/judge-agreement.md +24 -0
- package/docs/research/ko-2025-corpus-sources.md +135 -0
- package/docs/research/lexicon-freshness-audit.md +64 -0
- package/docs/research/zh-ja-lexicon-calibration.md +60 -0
- package/docs/social/patina-launch-copy.md +173 -100
- package/docs/social/patina-launch-execution.md +94 -0
- package/docs/social/patina-launch-korean-first.md +83 -0
- package/docs/social/signs-of-ai-writing.md +26 -0
- package/docs/social/signs-of-ai-writing_KR.md +26 -0
- package/lexicon/ai-en.md +21 -24
- package/lexicon/ai-ja.md +158 -0
- package/lexicon/ai-ko.md +9 -9
- package/lexicon/ai-zh.md +158 -0
- package/lexicon/provenance/ai-en.json +970 -0
- package/lexicon/provenance/ai-ja.json +542 -0
- package/lexicon/provenance/ai-ko.json +866 -0
- package/lexicon/provenance/ai-zh.json +542 -0
- package/package.json +49 -8
- package/patterns/en-communication.md +5 -0
- package/patterns/en-content.md +5 -0
- package/patterns/en-filler.md +5 -0
- package/patterns/en-language.md +29 -1
- package/patterns/en-structure.md +5 -0
- package/patterns/en-style.md +5 -0
- package/patterns/en-viral-hook.md +42 -2
- package/patterns/ja-communication.md +5 -0
- package/patterns/ja-content.md +5 -0
- package/patterns/ja-filler.md +5 -0
- package/patterns/ja-language.md +33 -1
- package/patterns/ja-structure.md +12 -0
- package/patterns/ja-style.md +5 -0
- package/patterns/ja-viral-hook.md +41 -2
- package/patterns/ko-communication.md +5 -0
- package/patterns/ko-content.md +5 -0
- package/patterns/ko-filler.md +5 -0
- package/patterns/ko-language.md +33 -1
- package/patterns/ko-structure.md +25 -6
- package/patterns/ko-style.md +5 -0
- package/patterns/ko-viral-hook.md +38 -2
- package/patterns/zh-communication.md +5 -0
- package/patterns/zh-content.md +5 -0
- package/patterns/zh-filler.md +5 -0
- package/patterns/zh-language.md +37 -1
- package/patterns/zh-structure.md +12 -0
- package/patterns/zh-style.md +5 -0
- package/patterns/zh-viral-hook.md +38 -2
- package/playground/README.md +55 -0
- package/playground/analytics.js +4 -0
- package/playground/analyzer.js +883 -0
- package/playground/app.js +157 -0
- package/playground/data/lexicons.js +343 -0
- package/playground/index.html +138 -0
- package/playground/styles.css +267 -0
- package/profiles/namuwiki.md +111 -0
- package/scripts/adversarial-mps-report.mjs +201 -0
- package/scripts/badge-json.mjs +79 -0
- package/scripts/benchmark-report.mjs +56 -9
- package/scripts/check-release-metadata.mjs +0 -2
- package/scripts/detector-comparison.mjs +7 -7
- package/scripts/generate-playground-data.mjs +77 -0
- package/scripts/katfish-calibration.mjs +464 -0
- package/scripts/lexicon-freshness.mjs +485 -0
- package/scripts/lint.mjs +1 -1
- package/scripts/precommit-score.mjs +4 -3
- package/scripts/prose-score.mjs +81 -5
- package/scripts/rebaseline-intake.mjs +242 -0
- package/scripts/rebaseline-score.mjs +268 -0
- package/scripts/rebaseline-summary.mjs +773 -0
- package/scripts/rebaseline-web-collect.mjs +410 -0
- package/scripts/update-benchmark-ranges.mjs +1 -0
- package/src/api.js +69 -105
- package/src/auth.js +50 -2
- package/src/backends/claude-cli.js +19 -4
- package/src/backends/codex-cli.js +19 -3
- package/src/backends/contract.js +230 -1
- package/src/backends/gemini-cli.js +18 -5
- package/src/backends/index.js +87 -12
- package/src/backends/kimi-cli.js +161 -0
- package/src/cli.js +577 -567
- package/src/commands/doctor.js +2 -2
- package/src/config.js +29 -0
- package/src/errors.js +53 -1
- package/src/features/discourse-tells.js +68 -0
- package/src/features/index.js +82 -8
- package/src/features/lexicon.js +40 -6
- package/src/features/markup-leakage.js +69 -0
- package/src/features/segment.js +41 -0
- package/src/features/signal-strength.js +81 -0
- package/src/features/stylometry.js +231 -1
- package/src/features/translationese.js +127 -0
- package/src/loader.js +76 -0
- package/src/logger.js +22 -23
- package/src/model-defaults.js +55 -0
- package/src/ouroboros.js +31 -0
- package/src/output.js +102 -90
- package/src/prompt-builder.js +103 -68
- package/src/providers.js +51 -4
- package/src/scoring.js +210 -2
- package/src/security.js +75 -0
- package/tests/fixtures/live-quality/en/public-docs-01.md +26 -0
- package/tests/fixtures/live-quality/ko/public-docs-01.md +26 -0
- package/tests/fixtures/suspect-zones/expected-ranges.json +207 -16
- package/tests/fixtures/suspect-zones/ja/ai/ja-ai-04-lexicon.md +11 -0
- package/tests/fixtures/suspect-zones/ja/natural/ja-nat-04-lexicon-cold.md +11 -0
- package/tests/fixtures/suspect-zones/ko/ai/ko-ai-02.md +4 -5
- package/tests/fixtures/suspect-zones/ko/ai/ko-ai-07-ko-diagnostic.md +11 -0
- package/tests/fixtures/suspect-zones/zh/ai/zh-ai-04-lexicon.md +11 -0
- package/tests/fixtures/suspect-zones/zh/natural/zh-nat-04-lexicon-cold.md +11 -0
- package/tests/quality/README.md +188 -11
- package/tests/quality/adversarial-mps/fixtures.jsonl +10 -0
- package/tests/quality/benchmark.mjs +39 -1
- package/tests/quality/dogfood.mjs +5 -3
- package/tests/quality/live-fixtures.jsonl +2 -0
- package/tests/quality/live-quality.mjs +596 -0
- package/tests/quality/ranking-metrics.mjs +136 -0
- package/tests/quality/rebaseline-manifest.example.jsonl +5 -0
- package/vercel.json +53 -0
- package/SKILL-MAX.md +0 -455
- package/docs/internal/HARNESS.md +0 -14
- package/docs/internal/README.md +0 -14
- package/docs/internal/WARP.md +0 -23
- package/patina-max/SKILL.md +0 -523
- package/patina-max/composite.py +0 -457
- package/src/cache.js +0 -106
- package/src/commands/init.js +0 -208
- package/src/manifest.js +0 -162
- package/src/max-mode.js +0 -207
|
@@ -0,0 +1,136 @@
|
|
|
1
|
+
# Lexicon Freshness Lift Report
|
|
2
|
+
|
|
3
|
+
- Language: en
|
|
4
|
+
- Source: hape-en-gpt4o-vs-human-2026-05-22
|
|
5
|
+
- Validated at: 2026-05-22
|
|
6
|
+
- Input: artifacts/rebaseline-2025/private/hape-en.private.jsonl
|
|
7
|
+
- Entries evaluated: 108
|
|
8
|
+
- Decision summary: 88 keep / 20 drop
|
|
9
|
+
- Gate: **PASS** (8290 hot docs, 8290 cold docs)
|
|
10
|
+
- Source note: HAP-E MIT English paired corpus: GPT-4o 2024-08-06 continuations vs human chunk_2; raw text kept local/private, aggregate only committed.
|
|
11
|
+
|
|
12
|
+
## Source provenance
|
|
13
|
+
|
|
14
|
+
- <https://huggingface.co/datasets/browndw/human-ai-parallel-corpus>
|
|
15
|
+
- <https://cmustatistics.github.io/data-repository/language/hap-e.html>
|
|
16
|
+
- Public report policy: aggregate counts only; raw corpus rows stay local/private.
|
|
17
|
+
|
|
18
|
+
## Register coverage
|
|
19
|
+
|
|
20
|
+
| class | registers |
|
|
21
|
+
|---|---|
|
|
22
|
+
| hot | acad=1227, blog=1526, fic=1395, news=1322, spok=1721, tvm=1099 |
|
|
23
|
+
| cold | acad=1227, blog=1526, fic=1395, news=1322, spok=1721, tvm=1099 |
|
|
24
|
+
|
|
25
|
+
## Entry decisions
|
|
26
|
+
|
|
27
|
+
| decision | kind | entry | hot docs | cold docs | lift | cold rate |
|
|
28
|
+
|---|---|---|---:|---:|---:|---:|
|
|
29
|
+
| drop | phrase | a host of | 9 | 14 | 0.64 | 0.17% |
|
|
30
|
+
| drop | phrase | a wide range of | 26 | 47 | 0.55 | 0.57% |
|
|
31
|
+
| drop | phrase | close the gap | 1 | 1 | 1 | 0.01% |
|
|
32
|
+
| drop | phrase | driving force | 25 | 7 | 3.57 | 0.08% |
|
|
33
|
+
| drop | phrase | end-to-end | 1 | 3 | 0.33 | 0.04% |
|
|
34
|
+
| drop | phrase | gain a deeper understanding | 0 | 0 | 0 | 0.00% |
|
|
35
|
+
| drop | phrase | in the age of | 6 | 4 | 1.5 | 0.05% |
|
|
36
|
+
| drop | phrase | it is essential to | 32 | 13 | 2.46 | 0.16% |
|
|
37
|
+
| drop | phrase | key drivers | 1 | 3 | 0.33 | 0.04% |
|
|
38
|
+
| drop | phrase | on the other hand | 157 | 158 | 0.99 | 1.91% |
|
|
39
|
+
| drop | phrase | play a key role | 1 | 5 | 0.2 | 0.06% |
|
|
40
|
+
| drop | phrase | to ensure that | 140 | 43 | 3.26 | 0.52% |
|
|
41
|
+
| drop | phrase | under the hood | 3 | 2 | 1.5 | 0.02% |
|
|
42
|
+
| drop | strict | dimensions | 179 | 46 | 3.89 | 0.55% |
|
|
43
|
+
| drop | strict | elevated | 74 | 61 | 1.21 | 0.74% |
|
|
44
|
+
| drop | strict | enable | 140 | 82 | 1.71 | 0.99% |
|
|
45
|
+
| drop | strict | framework | 380 | 129 | 2.95 | 1.56% |
|
|
46
|
+
| drop | strict | state-of-the-art | 36 | 24 | 1.5 | 0.29% |
|
|
47
|
+
| drop | strict | unleash | 45 | 13 | 3.46 | 0.16% |
|
|
48
|
+
| drop | strict | workflow | 11 | 8 | 1.38 | 0.10% |
|
|
49
|
+
| keep | phrase | a deeper dive | 7 | 0 | Infinity | 0.00% |
|
|
50
|
+
| keep | phrase | a myriad of | 67 | 3 | 22.33 | 0.04% |
|
|
51
|
+
| keep | phrase | a new chapter | 138 | 0 | Infinity | 0.00% |
|
|
52
|
+
| keep | phrase | a new era | 130 | 9 | 14.44 | 0.11% |
|
|
53
|
+
| keep | phrase | a new frontier | 8 | 0 | Infinity | 0.00% |
|
|
54
|
+
| keep | phrase | a plethora of | 26 | 5 | 5.2 | 0.06% |
|
|
55
|
+
| keep | phrase | a robust framework | 28 | 0 | Infinity | 0.00% |
|
|
56
|
+
| keep | phrase | a wide array of | 13 | 1 | 13 | 0.01% |
|
|
57
|
+
| keep | phrase | at its core | 48 | 2 | 24 | 0.02% |
|
|
58
|
+
| keep | phrase | at the forefront | 95 | 2 | 47.5 | 0.02% |
|
|
59
|
+
| keep | phrase | at the heart of | 143 | 18 | 7.94 | 0.22% |
|
|
60
|
+
| keep | phrase | best practices | 51 | 6 | 8.5 | 0.07% |
|
|
61
|
+
| keep | phrase | bridge the gap | 94 | 3 | 31.33 | 0.04% |
|
|
62
|
+
| keep | phrase | comprehensive approach | 33 | 1 | 33 | 0.01% |
|
|
63
|
+
| keep | phrase | continuous improvement | 28 | 1 | 28 | 0.01% |
|
|
64
|
+
| keep | phrase | ever-changing | 74 | 0 | Infinity | 0.00% |
|
|
65
|
+
| keep | phrase | ever-evolving | 144 | 0 | Infinity | 0.00% |
|
|
66
|
+
| keep | phrase | fast-paced | 53 | 3 | 17.67 | 0.04% |
|
|
67
|
+
| keep | phrase | gain valuable insights | 2 | 0 | Infinity | 0.00% |
|
|
68
|
+
| keep | phrase | glean insights | 3 | 0 | Infinity | 0.00% |
|
|
69
|
+
| keep | phrase | harness the power | 8 | 0 | Infinity | 0.00% |
|
|
70
|
+
| keep | phrase | holistic approach | 128 | 4 | 32 | 0.05% |
|
|
71
|
+
| keep | phrase | in the digital age | 23 | 0 | Infinity | 0.00% |
|
|
72
|
+
| keep | phrase | in the modern era | 7 | 1 | 7 | 0.01% |
|
|
73
|
+
| keep | phrase | in today's | 69 | 17 | 4.06 | 0.21% |
|
|
74
|
+
| keep | phrase | key insights | 4 | 0 | Infinity | 0.00% |
|
|
75
|
+
| keep | phrase | key takeaways | 2 | 0 | Infinity | 0.00% |
|
|
76
|
+
| keep | phrase | pave the path | 4 | 0 | Infinity | 0.00% |
|
|
77
|
+
| keep | phrase | pave the way | 133 | 1 | 133 | 0.01% |
|
|
78
|
+
| keep | phrase | play a crucial role | 75 | 4 | 18.75 | 0.05% |
|
|
79
|
+
| keep | phrase | plays a vital role | 11 | 1 | 11 | 0.01% |
|
|
80
|
+
| keep | phrase | rapidly changing | 42 | 1 | 42 | 0.01% |
|
|
81
|
+
| keep | phrase | rapidly evolving | 32 | 1 | 32 | 0.01% |
|
|
82
|
+
| keep | phrase | realize the potential | 3 | 0 | Infinity | 0.00% |
|
|
83
|
+
| keep | phrase | the bigger picture | 22 | 2 | 11 | 0.02% |
|
|
84
|
+
| keep | phrase | the competitive landscape | 1 | 0 | Infinity | 0.00% |
|
|
85
|
+
| keep | phrase | the digital landscape | 13 | 0 | Infinity | 0.00% |
|
|
86
|
+
| keep | phrase | the future of | 212 | 24 | 8.83 | 0.29% |
|
|
87
|
+
| keep | phrase | the landscape of | 134 | 1 | 134 | 0.01% |
|
|
88
|
+
| keep | phrase | the realm of | 224 | 7 | 32 | 0.08% |
|
|
89
|
+
| keep | phrase | the regulatory landscape | 4 | 0 | Infinity | 0.00% |
|
|
90
|
+
| keep | phrase | the world of | 241 | 42 | 5.74 | 0.51% |
|
|
91
|
+
| keep | phrase | unlock the potential | 6 | 0 | Infinity | 0.00% |
|
|
92
|
+
| keep | phrase | usher in | 37 | 6 | 6.17 | 0.07% |
|
|
93
|
+
| keep | phrase | valuable insights | 124 | 3 | 41.33 | 0.04% |
|
|
94
|
+
| keep | strict | accelerate | 69 | 17 | 4.06 | 0.21% |
|
|
95
|
+
| keep | strict | actionable | 104 | 3 | 34.67 | 0.04% |
|
|
96
|
+
| keep | strict | align | 370 | 17 | 21.76 | 0.21% |
|
|
97
|
+
| keep | strict | alignment | 135 | 23 | 5.87 | 0.28% |
|
|
98
|
+
| keep | strict | amplify | 117 | 5 | 23.4 | 0.06% |
|
|
99
|
+
| keep | strict | bespoke | 69 | 8 | 8.63 | 0.10% |
|
|
100
|
+
| keep | strict | bolster | 175 | 12 | 14.58 | 0.14% |
|
|
101
|
+
| keep | strict | catalyst | 161 | 26 | 6.19 | 0.31% |
|
|
102
|
+
| keep | strict | compelling | 340 | 27 | 12.59 | 0.33% |
|
|
103
|
+
| keep | strict | curated | 106 | 7 | 15.14 | 0.08% |
|
|
104
|
+
| keep | strict | cutting-edge | 165 | 1 | 165 | 0.01% |
|
|
105
|
+
| keep | strict | dynamic | 765 | 110 | 6.95 | 1.33% |
|
|
106
|
+
| keep | strict | ecosystem | 205 | 48 | 4.27 | 0.58% |
|
|
107
|
+
| keep | strict | elevate | 106 | 4 | 26.5 | 0.05% |
|
|
108
|
+
| keep | strict | empower | 142 | 4 | 35.5 | 0.05% |
|
|
109
|
+
| keep | strict | empowering | 166 | 7 | 23.71 | 0.08% |
|
|
110
|
+
| keep | strict | enabling | 263 | 39 | 6.74 | 0.47% |
|
|
111
|
+
| keep | strict | envision | 117 | 9 | 13 | 0.11% |
|
|
112
|
+
| keep | strict | ethical | 259 | 25 | 10.36 | 0.30% |
|
|
113
|
+
| keep | strict | harness | 218 | 14 | 15.57 | 0.17% |
|
|
114
|
+
| keep | strict | impactful | 83 | 3 | 27.67 | 0.04% |
|
|
115
|
+
| keep | strict | inclusive | 205 | 18 | 11.39 | 0.22% |
|
|
116
|
+
| keep | strict | inflection | 12 | 0 | Infinity | 0.00% |
|
|
117
|
+
| keep | strict | meaningful | 305 | 48 | 6.35 | 0.58% |
|
|
118
|
+
| keep | strict | modalities | 61 | 15 | 4.07 | 0.18% |
|
|
119
|
+
| keep | strict | pivot | 184 | 5 | 36.8 | 0.06% |
|
|
120
|
+
| keep | strict | prioritize | 239 | 3 | 79.67 | 0.04% |
|
|
121
|
+
| keep | strict | reimagine | 22 | 0 | Infinity | 0.00% |
|
|
122
|
+
| keep | strict | rethink | 45 | 3 | 15 | 0.04% |
|
|
123
|
+
| keep | strict | scalable | 34 | 5 | 6.8 | 0.06% |
|
|
124
|
+
| keep | strict | seamless | 176 | 4 | 44 | 0.05% |
|
|
125
|
+
| keep | strict | seamlessly | 352 | 9 | 39.11 | 0.11% |
|
|
126
|
+
| keep | strict | skillset | 4 | 0 | Infinity | 0.00% |
|
|
127
|
+
| keep | strict | streamline | 42 | 3 | 14 | 0.04% |
|
|
128
|
+
| keep | strict | streamlined | 26 | 3 | 8.67 | 0.04% |
|
|
129
|
+
| keep | strict | sustainable | 690 | 67 | 10.3 | 0.81% |
|
|
130
|
+
| keep | strict | thoughtful | 228 | 33 | 6.91 | 0.40% |
|
|
131
|
+
| keep | strict | thrive | 279 | 18 | 15.5 | 0.22% |
|
|
132
|
+
| keep | strict | thriving | 137 | 6 | 22.83 | 0.07% |
|
|
133
|
+
| keep | strict | toolkit | 39 | 3 | 13 | 0.04% |
|
|
134
|
+
| keep | strict | transformative | 417 | 5 | 83.4 | 0.06% |
|
|
135
|
+
| keep | strict | unlock | 165 | 15 | 11 | 0.18% |
|
|
136
|
+
| keep | strict | vibrant | 989 | 13 | 76.08 | 0.16% |
|
|
@@ -0,0 +1,381 @@
|
|
|
1
|
+
{
|
|
2
|
+
"schemaVersion": 1,
|
|
3
|
+
"generatedAt": "2026-05-21T18:13:21.576Z",
|
|
4
|
+
"input": "artifacts/rebaseline-2025/rebaseline-2026.scored.public.jsonl",
|
|
5
|
+
"targets": {
|
|
6
|
+
"protocolPerLanguageClassRegister": 25,
|
|
7
|
+
"claimPerCell": 100,
|
|
8
|
+
"claimLanguages": 2,
|
|
9
|
+
"claimGeneratorFamilies": 3
|
|
10
|
+
},
|
|
11
|
+
"totalRecords": 800,
|
|
12
|
+
"byLanguage": {
|
|
13
|
+
"ko": 400,
|
|
14
|
+
"en": 400
|
|
15
|
+
},
|
|
16
|
+
"byClass": {
|
|
17
|
+
"natural-human": 200,
|
|
18
|
+
"ai-like": 600
|
|
19
|
+
},
|
|
20
|
+
"byRegister": {
|
|
21
|
+
"product-doc": 140,
|
|
22
|
+
"academic-summary": 190,
|
|
23
|
+
"chat-update": 140,
|
|
24
|
+
"blog": 190,
|
|
25
|
+
"technical-how-to": 140
|
|
26
|
+
},
|
|
27
|
+
"byModelFamily": {
|
|
28
|
+
"human-reference": 200,
|
|
29
|
+
"claude-family": 200,
|
|
30
|
+
"gemini-family": 200,
|
|
31
|
+
"gpt-family": 200
|
|
32
|
+
},
|
|
33
|
+
"protocolCoverage": {
|
|
34
|
+
"totalCells": 80,
|
|
35
|
+
"populatedCells": 17,
|
|
36
|
+
"emptyCells": 63,
|
|
37
|
+
"cellsMeetingTarget": 12,
|
|
38
|
+
"underfilledCells": [
|
|
39
|
+
{
|
|
40
|
+
"key": "ko|natural-human|blog",
|
|
41
|
+
"count": 20
|
|
42
|
+
},
|
|
43
|
+
{
|
|
44
|
+
"key": "ko|natural-human|academic-summary",
|
|
45
|
+
"count": 20
|
|
46
|
+
},
|
|
47
|
+
{
|
|
48
|
+
"key": "ko|natural-human|product-doc",
|
|
49
|
+
"count": 20
|
|
50
|
+
},
|
|
51
|
+
{
|
|
52
|
+
"key": "ko|natural-human|chat-update",
|
|
53
|
+
"count": 20
|
|
54
|
+
},
|
|
55
|
+
{
|
|
56
|
+
"key": "ko|natural-human|technical-how-to",
|
|
57
|
+
"count": 20
|
|
58
|
+
}
|
|
59
|
+
]
|
|
60
|
+
},
|
|
61
|
+
"claimGate": {
|
|
62
|
+
"ready": true,
|
|
63
|
+
"blockers": [],
|
|
64
|
+
"qualifiedPositiveCells": [
|
|
65
|
+
{
|
|
66
|
+
"key": "en|claude-family",
|
|
67
|
+
"count": 100
|
|
68
|
+
},
|
|
69
|
+
{
|
|
70
|
+
"key": "en|gemini-family",
|
|
71
|
+
"count": 100
|
|
72
|
+
},
|
|
73
|
+
{
|
|
74
|
+
"key": "en|gpt-family",
|
|
75
|
+
"count": 100
|
|
76
|
+
},
|
|
77
|
+
{
|
|
78
|
+
"key": "ko|claude-family",
|
|
79
|
+
"count": 100
|
|
80
|
+
},
|
|
81
|
+
{
|
|
82
|
+
"key": "ko|gemini-family",
|
|
83
|
+
"count": 100
|
|
84
|
+
},
|
|
85
|
+
{
|
|
86
|
+
"key": "ko|gpt-family",
|
|
87
|
+
"count": 100
|
|
88
|
+
}
|
|
89
|
+
],
|
|
90
|
+
"qualifiedNaturalCells": [
|
|
91
|
+
{
|
|
92
|
+
"key": "ko",
|
|
93
|
+
"count": 100
|
|
94
|
+
},
|
|
95
|
+
{
|
|
96
|
+
"key": "en",
|
|
97
|
+
"count": 100
|
|
98
|
+
}
|
|
99
|
+
]
|
|
100
|
+
},
|
|
101
|
+
"metrics": {
|
|
102
|
+
"tp": 404,
|
|
103
|
+
"fp": 32,
|
|
104
|
+
"fn": 196,
|
|
105
|
+
"tn": 168,
|
|
106
|
+
"total": 800,
|
|
107
|
+
"accuracy": 0.715,
|
|
108
|
+
"precision": 0.927,
|
|
109
|
+
"recall": 0.673,
|
|
110
|
+
"f1": 0.78,
|
|
111
|
+
"falsePositiveRate": 0.16,
|
|
112
|
+
"falseNegativeRate": 0.327,
|
|
113
|
+
"accuracyCi": {
|
|
114
|
+
"low": 0.683,
|
|
115
|
+
"high": 0.745,
|
|
116
|
+
"method": "Wilson score interval, 95%"
|
|
117
|
+
},
|
|
118
|
+
"recallCi": {
|
|
119
|
+
"low": 0.635,
|
|
120
|
+
"high": 0.71,
|
|
121
|
+
"method": "Wilson score interval, 95%"
|
|
122
|
+
},
|
|
123
|
+
"falsePositiveRateCi": {
|
|
124
|
+
"low": 0.116,
|
|
125
|
+
"high": 0.217,
|
|
126
|
+
"method": "Wilson score interval, 95%"
|
|
127
|
+
}
|
|
128
|
+
},
|
|
129
|
+
"catchByLanguageFamily": {
|
|
130
|
+
"en|claude-family": {
|
|
131
|
+
"language": "en",
|
|
132
|
+
"modelFamily": "claude-family",
|
|
133
|
+
"n": 100,
|
|
134
|
+
"caught": 74,
|
|
135
|
+
"missed": 26,
|
|
136
|
+
"catchRate": 0.74,
|
|
137
|
+
"catchRateCi": {
|
|
138
|
+
"low": 0.646,
|
|
139
|
+
"high": 0.816,
|
|
140
|
+
"method": "Wilson score interval, 95%"
|
|
141
|
+
}
|
|
142
|
+
},
|
|
143
|
+
"en|gemini-family": {
|
|
144
|
+
"language": "en",
|
|
145
|
+
"modelFamily": "gemini-family",
|
|
146
|
+
"n": 100,
|
|
147
|
+
"caught": 79,
|
|
148
|
+
"missed": 21,
|
|
149
|
+
"catchRate": 0.79,
|
|
150
|
+
"catchRateCi": {
|
|
151
|
+
"low": 0.7,
|
|
152
|
+
"high": 0.858,
|
|
153
|
+
"method": "Wilson score interval, 95%"
|
|
154
|
+
}
|
|
155
|
+
},
|
|
156
|
+
"en|gpt-family": {
|
|
157
|
+
"language": "en",
|
|
158
|
+
"modelFamily": "gpt-family",
|
|
159
|
+
"n": 100,
|
|
160
|
+
"caught": 77,
|
|
161
|
+
"missed": 23,
|
|
162
|
+
"catchRate": 0.77,
|
|
163
|
+
"catchRateCi": {
|
|
164
|
+
"low": 0.678,
|
|
165
|
+
"high": 0.842,
|
|
166
|
+
"method": "Wilson score interval, 95%"
|
|
167
|
+
}
|
|
168
|
+
},
|
|
169
|
+
"ko|claude-family": {
|
|
170
|
+
"language": "ko",
|
|
171
|
+
"modelFamily": "claude-family",
|
|
172
|
+
"n": 100,
|
|
173
|
+
"caught": 68,
|
|
174
|
+
"missed": 32,
|
|
175
|
+
"catchRate": 0.68,
|
|
176
|
+
"catchRateCi": {
|
|
177
|
+
"low": 0.583,
|
|
178
|
+
"high": 0.763,
|
|
179
|
+
"method": "Wilson score interval, 95%"
|
|
180
|
+
}
|
|
181
|
+
},
|
|
182
|
+
"ko|gemini-family": {
|
|
183
|
+
"language": "ko",
|
|
184
|
+
"modelFamily": "gemini-family",
|
|
185
|
+
"n": 100,
|
|
186
|
+
"caught": 62,
|
|
187
|
+
"missed": 38,
|
|
188
|
+
"catchRate": 0.62,
|
|
189
|
+
"catchRateCi": {
|
|
190
|
+
"low": 0.522,
|
|
191
|
+
"high": 0.709,
|
|
192
|
+
"method": "Wilson score interval, 95%"
|
|
193
|
+
}
|
|
194
|
+
},
|
|
195
|
+
"ko|gpt-family": {
|
|
196
|
+
"language": "ko",
|
|
197
|
+
"modelFamily": "gpt-family",
|
|
198
|
+
"n": 100,
|
|
199
|
+
"caught": 44,
|
|
200
|
+
"missed": 56,
|
|
201
|
+
"catchRate": 0.44,
|
|
202
|
+
"catchRateCi": {
|
|
203
|
+
"low": 0.347,
|
|
204
|
+
"high": 0.538,
|
|
205
|
+
"method": "Wilson score interval, 95%"
|
|
206
|
+
}
|
|
207
|
+
}
|
|
208
|
+
},
|
|
209
|
+
"falsePositiveByLanguage": {
|
|
210
|
+
"en": {
|
|
211
|
+
"language": "en",
|
|
212
|
+
"n": 100,
|
|
213
|
+
"falsePositives": 14,
|
|
214
|
+
"trueNegatives": 86,
|
|
215
|
+
"falsePositiveRate": 0.14,
|
|
216
|
+
"falsePositiveRateCi": {
|
|
217
|
+
"low": 0.085,
|
|
218
|
+
"high": 0.221,
|
|
219
|
+
"method": "Wilson score interval, 95%"
|
|
220
|
+
}
|
|
221
|
+
},
|
|
222
|
+
"ko": {
|
|
223
|
+
"language": "ko",
|
|
224
|
+
"n": 100,
|
|
225
|
+
"falsePositives": 18,
|
|
226
|
+
"trueNegatives": 82,
|
|
227
|
+
"falsePositiveRate": 0.18,
|
|
228
|
+
"falsePositiveRateCi": {
|
|
229
|
+
"low": 0.117,
|
|
230
|
+
"high": 0.267,
|
|
231
|
+
"method": "Wilson score interval, 95%"
|
|
232
|
+
}
|
|
233
|
+
}
|
|
234
|
+
},
|
|
235
|
+
"metricsByRegister": {
|
|
236
|
+
"academic-summary": {
|
|
237
|
+
"tp": 89,
|
|
238
|
+
"fp": 18,
|
|
239
|
+
"fn": 31,
|
|
240
|
+
"tn": 52,
|
|
241
|
+
"total": 190,
|
|
242
|
+
"accuracy": 0.742,
|
|
243
|
+
"precision": 0.832,
|
|
244
|
+
"recall": 0.742,
|
|
245
|
+
"f1": 0.784,
|
|
246
|
+
"falsePositiveRate": 0.257,
|
|
247
|
+
"falseNegativeRate": 0.258,
|
|
248
|
+
"accuracyCi": {
|
|
249
|
+
"low": 0.676,
|
|
250
|
+
"high": 0.799,
|
|
251
|
+
"method": "Wilson score interval, 95%"
|
|
252
|
+
},
|
|
253
|
+
"recallCi": {
|
|
254
|
+
"low": 0.657,
|
|
255
|
+
"high": 0.812,
|
|
256
|
+
"method": "Wilson score interval, 95%"
|
|
257
|
+
},
|
|
258
|
+
"falsePositiveRateCi": {
|
|
259
|
+
"low": 0.169,
|
|
260
|
+
"high": 0.37,
|
|
261
|
+
"method": "Wilson score interval, 95%"
|
|
262
|
+
}
|
|
263
|
+
},
|
|
264
|
+
"blog": {
|
|
265
|
+
"tp": 70,
|
|
266
|
+
"fp": 6,
|
|
267
|
+
"fn": 50,
|
|
268
|
+
"tn": 64,
|
|
269
|
+
"total": 190,
|
|
270
|
+
"accuracy": 0.705,
|
|
271
|
+
"precision": 0.921,
|
|
272
|
+
"recall": 0.583,
|
|
273
|
+
"f1": 0.714,
|
|
274
|
+
"falsePositiveRate": 0.086,
|
|
275
|
+
"falseNegativeRate": 0.417,
|
|
276
|
+
"accuracyCi": {
|
|
277
|
+
"low": 0.637,
|
|
278
|
+
"high": 0.766,
|
|
279
|
+
"method": "Wilson score interval, 95%"
|
|
280
|
+
},
|
|
281
|
+
"recallCi": {
|
|
282
|
+
"low": 0.494,
|
|
283
|
+
"high": 0.668,
|
|
284
|
+
"method": "Wilson score interval, 95%"
|
|
285
|
+
},
|
|
286
|
+
"falsePositiveRateCi": {
|
|
287
|
+
"low": 0.04,
|
|
288
|
+
"high": 0.175,
|
|
289
|
+
"method": "Wilson score interval, 95%"
|
|
290
|
+
}
|
|
291
|
+
},
|
|
292
|
+
"chat-update": {
|
|
293
|
+
"tp": 61,
|
|
294
|
+
"fp": 0,
|
|
295
|
+
"fn": 59,
|
|
296
|
+
"tn": 20,
|
|
297
|
+
"total": 140,
|
|
298
|
+
"accuracy": 0.579,
|
|
299
|
+
"precision": 1,
|
|
300
|
+
"recall": 0.508,
|
|
301
|
+
"f1": 0.674,
|
|
302
|
+
"falsePositiveRate": 0,
|
|
303
|
+
"falseNegativeRate": 0.492,
|
|
304
|
+
"accuracyCi": {
|
|
305
|
+
"low": 0.496,
|
|
306
|
+
"high": 0.657,
|
|
307
|
+
"method": "Wilson score interval, 95%"
|
|
308
|
+
},
|
|
309
|
+
"recallCi": {
|
|
310
|
+
"low": 0.42,
|
|
311
|
+
"high": 0.596,
|
|
312
|
+
"method": "Wilson score interval, 95%"
|
|
313
|
+
},
|
|
314
|
+
"falsePositiveRateCi": {
|
|
315
|
+
"low": 0,
|
|
316
|
+
"high": 0.161,
|
|
317
|
+
"method": "Wilson score interval, 95%"
|
|
318
|
+
}
|
|
319
|
+
},
|
|
320
|
+
"product-doc": {
|
|
321
|
+
"tp": 94,
|
|
322
|
+
"fp": 2,
|
|
323
|
+
"fn": 26,
|
|
324
|
+
"tn": 18,
|
|
325
|
+
"total": 140,
|
|
326
|
+
"accuracy": 0.8,
|
|
327
|
+
"precision": 0.979,
|
|
328
|
+
"recall": 0.783,
|
|
329
|
+
"f1": 0.87,
|
|
330
|
+
"falsePositiveRate": 0.1,
|
|
331
|
+
"falseNegativeRate": 0.217,
|
|
332
|
+
"accuracyCi": {
|
|
333
|
+
"low": 0.726,
|
|
334
|
+
"high": 0.858,
|
|
335
|
+
"method": "Wilson score interval, 95%"
|
|
336
|
+
},
|
|
337
|
+
"recallCi": {
|
|
338
|
+
"low": 0.701,
|
|
339
|
+
"high": 0.848,
|
|
340
|
+
"method": "Wilson score interval, 95%"
|
|
341
|
+
},
|
|
342
|
+
"falsePositiveRateCi": {
|
|
343
|
+
"low": 0.028,
|
|
344
|
+
"high": 0.301,
|
|
345
|
+
"method": "Wilson score interval, 95%"
|
|
346
|
+
}
|
|
347
|
+
},
|
|
348
|
+
"technical-how-to": {
|
|
349
|
+
"tp": 90,
|
|
350
|
+
"fp": 6,
|
|
351
|
+
"fn": 30,
|
|
352
|
+
"tn": 14,
|
|
353
|
+
"total": 140,
|
|
354
|
+
"accuracy": 0.743,
|
|
355
|
+
"precision": 0.938,
|
|
356
|
+
"recall": 0.75,
|
|
357
|
+
"f1": 0.833,
|
|
358
|
+
"falsePositiveRate": 0.3,
|
|
359
|
+
"falseNegativeRate": 0.25,
|
|
360
|
+
"accuracyCi": {
|
|
361
|
+
"low": 0.665,
|
|
362
|
+
"high": 0.808,
|
|
363
|
+
"method": "Wilson score interval, 95%"
|
|
364
|
+
},
|
|
365
|
+
"recallCi": {
|
|
366
|
+
"low": 0.666,
|
|
367
|
+
"high": 0.819,
|
|
368
|
+
"method": "Wilson score interval, 95%"
|
|
369
|
+
},
|
|
370
|
+
"falsePositiveRateCi": {
|
|
371
|
+
"low": 0.145,
|
|
372
|
+
"high": 0.519,
|
|
373
|
+
"method": "Wilson score interval, 95%"
|
|
374
|
+
}
|
|
375
|
+
}
|
|
376
|
+
},
|
|
377
|
+
"validation": {
|
|
378
|
+
"errors": [],
|
|
379
|
+
"warnings": []
|
|
380
|
+
}
|
|
381
|
+
}
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
# Rebaseline Manifest Summary
|
|
2
|
+
|
|
3
|
+
- Generated at: 2026-05-21T18:13:21.576Z
|
|
4
|
+
- Input: `artifacts/rebaseline-2025/rebaseline-2026.scored.public.jsonl`
|
|
5
|
+
- Records: 800
|
|
6
|
+
- Protocol target: 25 samples per language × class × register cell
|
|
7
|
+
- Public claim target: 100 samples per claim cell, 2+ languages, 3+ generator families
|
|
8
|
+
|
|
9
|
+
## Validation
|
|
10
|
+
|
|
11
|
+
Validation: **PASS**
|
|
12
|
+
|
|
13
|
+
## Coverage snapshot
|
|
14
|
+
|
|
15
|
+
### By language
|
|
16
|
+
|
|
17
|
+
| value | n |
|
|
18
|
+
|---|---:|
|
|
19
|
+
| ko | 400 |
|
|
20
|
+
| en | 400 |
|
|
21
|
+
| zh | 0 |
|
|
22
|
+
| ja | 0 |
|
|
23
|
+
|
|
24
|
+
### By class
|
|
25
|
+
|
|
26
|
+
| value | n |
|
|
27
|
+
|---|---:|
|
|
28
|
+
| ai-like | 600 |
|
|
29
|
+
| natural-human | 200 |
|
|
30
|
+
| lightly-edited-ai | 0 |
|
|
31
|
+
| heavily-edited-ai | 0 |
|
|
32
|
+
|
|
33
|
+
### By register
|
|
34
|
+
|
|
35
|
+
| value | n |
|
|
36
|
+
|---|---:|
|
|
37
|
+
| blog | 190 |
|
|
38
|
+
| academic-summary | 190 |
|
|
39
|
+
| product-doc | 140 |
|
|
40
|
+
| chat-update | 140 |
|
|
41
|
+
| technical-how-to | 140 |
|
|
42
|
+
|
|
43
|
+
### By model family
|
|
44
|
+
|
|
45
|
+
| value | n |
|
|
46
|
+
|---|---:|
|
|
47
|
+
| gpt-family | 200 |
|
|
48
|
+
| claude-family | 200 |
|
|
49
|
+
| gemini-family | 200 |
|
|
50
|
+
| open-weight | 0 |
|
|
51
|
+
| human-reference | 200 |
|
|
52
|
+
|
|
53
|
+
## Protocol matrix
|
|
54
|
+
|
|
55
|
+
- Populated language × class × register cells: 17/80
|
|
56
|
+
- Cells meeting 25+ samples: 12
|
|
57
|
+
- Empty cells: 63
|
|
58
|
+
- Underfilled populated cells: 5
|
|
59
|
+
|
|
60
|
+
| cell | n |
|
|
61
|
+
|---|---:|
|
|
62
|
+
| ko × natural-human × blog | 20 |
|
|
63
|
+
| ko × natural-human × academic-summary | 20 |
|
|
64
|
+
| ko × natural-human × product-doc | 20 |
|
|
65
|
+
| ko × natural-human × chat-update | 20 |
|
|
66
|
+
| ko × natural-human × technical-how-to | 20 |
|
|
67
|
+
|
|
68
|
+
## Public performance claim gate
|
|
69
|
+
|
|
70
|
+
Public performance claim: **READY**
|
|
71
|
+
|
|
72
|
+
Gate conditions met by this manifest.
|
|
73
|
+
|
|
74
|
+
| claim-gate count | value |
|
|
75
|
+
|---|---:|
|
|
76
|
+
| qualified positive cells (language × generator family, n≥100) | 6 |
|
|
77
|
+
| qualified natural-language cells (language, n≥100) | 2 |
|
|
78
|
+
| outcome rows with expected/predicted labels | 800 |
|
|
79
|
+
|
|
80
|
+
## Outcome metrics
|
|
81
|
+
|
|
82
|
+
| metric | value |
|
|
83
|
+
|---|---:|
|
|
84
|
+
| accuracy | 71.5% |
|
|
85
|
+
| accuracy CI | 68.3%–74.5% |
|
|
86
|
+
| precision | 92.7% |
|
|
87
|
+
| recall | 67.3% |
|
|
88
|
+
| recall CI | 63.5%–71.0% |
|
|
89
|
+
| F1 | 0.780 |
|
|
90
|
+
| false positive rate | 16.0% |
|
|
91
|
+
| false positive rate CI | 11.6%–21.7% |
|
|
92
|
+
| false negative rate | 32.7% |
|
|
93
|
+
| TP/FP/FN/TN | 404/32/196/168 |
|
|
94
|
+
|
|
95
|
+
### Catch rate by language × model family
|
|
96
|
+
|
|
97
|
+
| language | model family | n | catch rate | 95% CI | caught/missed |
|
|
98
|
+
|---|---|---:|---:|---:|---:|
|
|
99
|
+
| en | claude-family | 100 | 74.0% | 64.6%–81.6% | 74/26 |
|
|
100
|
+
| en | gemini-family | 100 | 79.0% | 70.0%–85.8% | 79/21 |
|
|
101
|
+
| en | gpt-family | 100 | 77.0% | 67.8%–84.2% | 77/23 |
|
|
102
|
+
| ko | claude-family | 100 | 68.0% | 58.3%–76.3% | 68/32 |
|
|
103
|
+
| ko | gemini-family | 100 | 62.0% | 52.2%–70.9% | 62/38 |
|
|
104
|
+
| ko | gpt-family | 100 | 44.0% | 34.7%–53.8% | 44/56 |
|
|
105
|
+
|
|
106
|
+
### False-positive rate by language
|
|
107
|
+
|
|
108
|
+
| language | n | false-positive rate | 95% CI | FP/TN |
|
|
109
|
+
|---|---:|---:|---:|---:|
|
|
110
|
+
| en | 100 | 14.0% | 8.5%–22.1% | 14/86 |
|
|
111
|
+
| ko | 100 | 18.0% | 11.7%–26.7% | 18/82 |
|
|
112
|
+
|
|
113
|
+
### By register
|
|
114
|
+
|
|
115
|
+
| register | n | FP rate | FN rate | TP/FP/FN/TN |
|
|
116
|
+
|---|---:|---:|---:|---:|
|
|
117
|
+
| blog | 190 | 8.6% | 41.7% | 70/6/50/64 |
|
|
118
|
+
| academic-summary | 190 | 25.7% | 25.8% | 89/18/31/52 |
|
|
119
|
+
| product-doc | 140 | 10.0% | 21.7% | 94/2/26/18 |
|
|
120
|
+
| chat-update | 140 | 0.0% | 49.2% | 61/0/59/20 |
|
|
121
|
+
| technical-how-to | 140 | 30.0% | 25.0% | 90/6/30/14 |
|