patina-cli 3.11.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (193) hide show
  1. package/.patina.default.yaml +29 -29
  2. package/CHANGELOG.md +53 -0
  3. package/NOTICE +21 -0
  4. package/README.md +117 -224
  5. package/README_JA.md +134 -77
  6. package/README_KR.md +132 -74
  7. package/README_ZH.md +137 -80
  8. package/SKILL.md +11 -20
  9. package/artifacts/rebaseline-2025/README.md +147 -0
  10. package/artifacts/rebaseline-2025/human-controls.public.jsonl +250 -0
  11. package/artifacts/rebaseline-2025/intake.example.jsonl +2 -0
  12. package/artifacts/rebaseline-2025/intake.local.example.jsonl +25 -0
  13. package/artifacts/rebaseline-2025/prompts.template.jsonl +7 -0
  14. package/artifacts/rebaseline-2025/sources.ko-public.jsonl +39 -0
  15. package/assets/brand/patina-badge.svg +18 -0
  16. package/assets/brand/patina-mark.svg +8 -0
  17. package/assets/demo/README.md +79 -0
  18. package/core/scoring.md +12 -12
  19. package/core/standalone-prompt.md +3 -1
  20. package/core/stylometry.md +93 -22
  21. package/docs/API.md +1554 -0
  22. package/docs/AUTHENTICATION.md +50 -26
  23. package/docs/AUTHENTICATION_KR.md +54 -29
  24. package/docs/BRANDING.md +9 -8
  25. package/docs/CLI.md +55 -14
  26. package/docs/COOKBOOK.md +8 -21
  27. package/docs/DEMO.md +32 -5
  28. package/docs/EXIT-CODES.md +2 -3
  29. package/docs/FALSE-POSITIVES.md +63 -0
  30. package/docs/FAQ.md +9 -1
  31. package/docs/FAQ_KR.md +3 -1
  32. package/docs/FLAG-PARITY.md +33 -47
  33. package/docs/ISSUE-WAVES.md +57 -0
  34. package/docs/PATTERNS-EN.md +67 -3
  35. package/docs/PATTERNS-JA.md +68 -2
  36. package/docs/PATTERNS-KO.md +70 -7
  37. package/docs/PATTERNS-ZH.md +67 -3
  38. package/docs/PATTERNS.md +5 -5
  39. package/docs/RESEARCH-DOCS-PLATFORM.md +54 -0
  40. package/docs/ROADMAP.md +46 -66
  41. package/docs/TRANSLATIONESE-KO.md +51 -0
  42. package/docs/audits/2026-05-deep-research.md +3 -1
  43. package/docs/benchmarks/README.md +51 -0
  44. package/docs/benchmarks/detector-comparison.json +69 -9
  45. package/docs/benchmarks/detector-comparison.md +10 -5
  46. package/docs/benchmarks/katfish-ko-latest.json +657 -0
  47. package/docs/benchmarks/katfish-ko-latest.md +77 -0
  48. package/docs/benchmarks/latest.json +1183 -108
  49. package/docs/benchmarks/latest.md +84 -60
  50. package/docs/benchmarks/lexicon-freshness-en-2026-05-22.json +1121 -0
  51. package/docs/benchmarks/lexicon-freshness-en-2026-05-22.md +136 -0
  52. package/docs/benchmarks/rebaseline-latest.json +381 -0
  53. package/docs/benchmarks/rebaseline-latest.md +121 -0
  54. package/docs/benchmarks/register-stratified-latest.json +164 -0
  55. package/docs/benchmarks/register-stratified-latest.md +99 -0
  56. package/docs/benchmarks/register-stratified.md +43 -0
  57. package/docs/integrations/github-action.md +44 -11
  58. package/docs/integrations/playground.md +58 -0
  59. package/docs/integrations/pre-commit.md +5 -5
  60. package/docs/integrations/release.md +5 -3
  61. package/docs/integrations/static-sites.md +83 -0
  62. package/docs/research/2025-rebaseline-plan.md +71 -2
  63. package/docs/research/2026-rebaseline.md +102 -0
  64. package/docs/research/adversarial-mps.md +41 -0
  65. package/docs/research/ai-human-metrics.md +35 -23
  66. package/docs/research/human-eval-panel.md +42 -0
  67. package/docs/research/judge-agreement.md +24 -0
  68. package/docs/research/ko-2025-corpus-sources.md +135 -0
  69. package/docs/research/lexicon-freshness-audit.md +64 -0
  70. package/docs/research/zh-ja-lexicon-calibration.md +60 -0
  71. package/docs/social/patina-launch-copy.md +173 -100
  72. package/docs/social/patina-launch-execution.md +94 -0
  73. package/docs/social/patina-launch-korean-first.md +83 -0
  74. package/docs/social/signs-of-ai-writing.md +26 -0
  75. package/docs/social/signs-of-ai-writing_KR.md +26 -0
  76. package/lexicon/ai-en.md +21 -24
  77. package/lexicon/ai-ja.md +158 -0
  78. package/lexicon/ai-ko.md +9 -9
  79. package/lexicon/ai-zh.md +158 -0
  80. package/lexicon/provenance/ai-en.json +970 -0
  81. package/lexicon/provenance/ai-ja.json +542 -0
  82. package/lexicon/provenance/ai-ko.json +866 -0
  83. package/lexicon/provenance/ai-zh.json +542 -0
  84. package/package.json +49 -8
  85. package/patterns/en-communication.md +5 -0
  86. package/patterns/en-content.md +5 -0
  87. package/patterns/en-filler.md +5 -0
  88. package/patterns/en-language.md +29 -1
  89. package/patterns/en-structure.md +5 -0
  90. package/patterns/en-style.md +5 -0
  91. package/patterns/en-viral-hook.md +42 -2
  92. package/patterns/ja-communication.md +5 -0
  93. package/patterns/ja-content.md +5 -0
  94. package/patterns/ja-filler.md +5 -0
  95. package/patterns/ja-language.md +33 -1
  96. package/patterns/ja-structure.md +12 -0
  97. package/patterns/ja-style.md +5 -0
  98. package/patterns/ja-viral-hook.md +41 -2
  99. package/patterns/ko-communication.md +5 -0
  100. package/patterns/ko-content.md +5 -0
  101. package/patterns/ko-filler.md +5 -0
  102. package/patterns/ko-language.md +33 -1
  103. package/patterns/ko-structure.md +25 -6
  104. package/patterns/ko-style.md +5 -0
  105. package/patterns/ko-viral-hook.md +38 -2
  106. package/patterns/zh-communication.md +5 -0
  107. package/patterns/zh-content.md +5 -0
  108. package/patterns/zh-filler.md +5 -0
  109. package/patterns/zh-language.md +37 -1
  110. package/patterns/zh-structure.md +12 -0
  111. package/patterns/zh-style.md +5 -0
  112. package/patterns/zh-viral-hook.md +38 -2
  113. package/playground/README.md +55 -0
  114. package/playground/analytics.js +4 -0
  115. package/playground/analyzer.js +883 -0
  116. package/playground/app.js +157 -0
  117. package/playground/data/lexicons.js +343 -0
  118. package/playground/index.html +138 -0
  119. package/playground/styles.css +267 -0
  120. package/profiles/namuwiki.md +111 -0
  121. package/scripts/adversarial-mps-report.mjs +201 -0
  122. package/scripts/badge-json.mjs +79 -0
  123. package/scripts/benchmark-report.mjs +56 -9
  124. package/scripts/check-release-metadata.mjs +0 -2
  125. package/scripts/detector-comparison.mjs +7 -7
  126. package/scripts/generate-playground-data.mjs +77 -0
  127. package/scripts/katfish-calibration.mjs +464 -0
  128. package/scripts/lexicon-freshness.mjs +485 -0
  129. package/scripts/lint.mjs +1 -1
  130. package/scripts/precommit-score.mjs +4 -3
  131. package/scripts/prose-score.mjs +81 -5
  132. package/scripts/rebaseline-intake.mjs +242 -0
  133. package/scripts/rebaseline-score.mjs +268 -0
  134. package/scripts/rebaseline-summary.mjs +773 -0
  135. package/scripts/rebaseline-web-collect.mjs +410 -0
  136. package/scripts/update-benchmark-ranges.mjs +1 -0
  137. package/src/api.js +69 -105
  138. package/src/auth.js +50 -2
  139. package/src/backends/claude-cli.js +19 -4
  140. package/src/backends/codex-cli.js +19 -3
  141. package/src/backends/contract.js +230 -1
  142. package/src/backends/gemini-cli.js +18 -5
  143. package/src/backends/index.js +87 -12
  144. package/src/backends/kimi-cli.js +161 -0
  145. package/src/cli.js +577 -567
  146. package/src/commands/doctor.js +2 -2
  147. package/src/config.js +29 -0
  148. package/src/errors.js +53 -1
  149. package/src/features/discourse-tells.js +68 -0
  150. package/src/features/index.js +82 -8
  151. package/src/features/lexicon.js +40 -6
  152. package/src/features/markup-leakage.js +69 -0
  153. package/src/features/segment.js +41 -0
  154. package/src/features/signal-strength.js +81 -0
  155. package/src/features/stylometry.js +231 -1
  156. package/src/features/translationese.js +127 -0
  157. package/src/loader.js +76 -0
  158. package/src/logger.js +22 -23
  159. package/src/model-defaults.js +55 -0
  160. package/src/ouroboros.js +31 -0
  161. package/src/output.js +102 -90
  162. package/src/prompt-builder.js +103 -68
  163. package/src/providers.js +51 -4
  164. package/src/scoring.js +210 -2
  165. package/src/security.js +75 -0
  166. package/tests/fixtures/live-quality/en/public-docs-01.md +26 -0
  167. package/tests/fixtures/live-quality/ko/public-docs-01.md +26 -0
  168. package/tests/fixtures/suspect-zones/expected-ranges.json +207 -16
  169. package/tests/fixtures/suspect-zones/ja/ai/ja-ai-04-lexicon.md +11 -0
  170. package/tests/fixtures/suspect-zones/ja/natural/ja-nat-04-lexicon-cold.md +11 -0
  171. package/tests/fixtures/suspect-zones/ko/ai/ko-ai-02.md +4 -5
  172. package/tests/fixtures/suspect-zones/ko/ai/ko-ai-07-ko-diagnostic.md +11 -0
  173. package/tests/fixtures/suspect-zones/zh/ai/zh-ai-04-lexicon.md +11 -0
  174. package/tests/fixtures/suspect-zones/zh/natural/zh-nat-04-lexicon-cold.md +11 -0
  175. package/tests/quality/README.md +188 -11
  176. package/tests/quality/adversarial-mps/fixtures.jsonl +10 -0
  177. package/tests/quality/benchmark.mjs +39 -1
  178. package/tests/quality/dogfood.mjs +5 -3
  179. package/tests/quality/live-fixtures.jsonl +2 -0
  180. package/tests/quality/live-quality.mjs +596 -0
  181. package/tests/quality/ranking-metrics.mjs +136 -0
  182. package/tests/quality/rebaseline-manifest.example.jsonl +5 -0
  183. package/vercel.json +53 -0
  184. package/SKILL-MAX.md +0 -455
  185. package/docs/internal/HARNESS.md +0 -14
  186. package/docs/internal/README.md +0 -14
  187. package/docs/internal/WARP.md +0 -23
  188. package/patina-max/SKILL.md +0 -523
  189. package/patina-max/composite.py +0 -457
  190. package/src/cache.js +0 -106
  191. package/src/commands/init.js +0 -208
  192. package/src/manifest.js +0 -162
  193. package/src/max-mode.js +0 -207
@@ -0,0 +1,136 @@
1
+ # Lexicon Freshness Lift Report
2
+
3
+ - Language: en
4
+ - Source: hape-en-gpt4o-vs-human-2026-05-22
5
+ - Validated at: 2026-05-22
6
+ - Input: artifacts/rebaseline-2025/private/hape-en.private.jsonl
7
+ - Entries evaluated: 108
8
+ - Decision summary: 88 keep / 20 drop
9
+ - Gate: **PASS** (8290 hot docs, 8290 cold docs)
10
+ - Source note: HAP-E MIT English paired corpus: GPT-4o 2024-08-06 continuations vs human chunk_2; raw text kept local/private, aggregate only committed.
11
+
12
+ ## Source provenance
13
+
14
+ - <https://huggingface.co/datasets/browndw/human-ai-parallel-corpus>
15
+ - <https://cmustatistics.github.io/data-repository/language/hap-e.html>
16
+ - Public report policy: aggregate counts only; raw corpus rows stay local/private.
17
+
18
+ ## Register coverage
19
+
20
+ | class | registers |
21
+ |---|---|
22
+ | hot | acad=1227, blog=1526, fic=1395, news=1322, spok=1721, tvm=1099 |
23
+ | cold | acad=1227, blog=1526, fic=1395, news=1322, spok=1721, tvm=1099 |
24
+
25
+ ## Entry decisions
26
+
27
+ | decision | kind | entry | hot docs | cold docs | lift | cold rate |
28
+ |---|---|---|---:|---:|---:|---:|
29
+ | drop | phrase | a host of | 9 | 14 | 0.64 | 0.17% |
30
+ | drop | phrase | a wide range of | 26 | 47 | 0.55 | 0.57% |
31
+ | drop | phrase | close the gap | 1 | 1 | 1 | 0.01% |
32
+ | drop | phrase | driving force | 25 | 7 | 3.57 | 0.08% |
33
+ | drop | phrase | end-to-end | 1 | 3 | 0.33 | 0.04% |
34
+ | drop | phrase | gain a deeper understanding | 0 | 0 | 0 | 0.00% |
35
+ | drop | phrase | in the age of | 6 | 4 | 1.5 | 0.05% |
36
+ | drop | phrase | it is essential to | 32 | 13 | 2.46 | 0.16% |
37
+ | drop | phrase | key drivers | 1 | 3 | 0.33 | 0.04% |
38
+ | drop | phrase | on the other hand | 157 | 158 | 0.99 | 1.91% |
39
+ | drop | phrase | play a key role | 1 | 5 | 0.2 | 0.06% |
40
+ | drop | phrase | to ensure that | 140 | 43 | 3.26 | 0.52% |
41
+ | drop | phrase | under the hood | 3 | 2 | 1.5 | 0.02% |
42
+ | drop | strict | dimensions | 179 | 46 | 3.89 | 0.55% |
43
+ | drop | strict | elevated | 74 | 61 | 1.21 | 0.74% |
44
+ | drop | strict | enable | 140 | 82 | 1.71 | 0.99% |
45
+ | drop | strict | framework | 380 | 129 | 2.95 | 1.56% |
46
+ | drop | strict | state-of-the-art | 36 | 24 | 1.5 | 0.29% |
47
+ | drop | strict | unleash | 45 | 13 | 3.46 | 0.16% |
48
+ | drop | strict | workflow | 11 | 8 | 1.38 | 0.10% |
49
+ | keep | phrase | a deeper dive | 7 | 0 | Infinity | 0.00% |
50
+ | keep | phrase | a myriad of | 67 | 3 | 22.33 | 0.04% |
51
+ | keep | phrase | a new chapter | 138 | 0 | Infinity | 0.00% |
52
+ | keep | phrase | a new era | 130 | 9 | 14.44 | 0.11% |
53
+ | keep | phrase | a new frontier | 8 | 0 | Infinity | 0.00% |
54
+ | keep | phrase | a plethora of | 26 | 5 | 5.2 | 0.06% |
55
+ | keep | phrase | a robust framework | 28 | 0 | Infinity | 0.00% |
56
+ | keep | phrase | a wide array of | 13 | 1 | 13 | 0.01% |
57
+ | keep | phrase | at its core | 48 | 2 | 24 | 0.02% |
58
+ | keep | phrase | at the forefront | 95 | 2 | 47.5 | 0.02% |
59
+ | keep | phrase | at the heart of | 143 | 18 | 7.94 | 0.22% |
60
+ | keep | phrase | best practices | 51 | 6 | 8.5 | 0.07% |
61
+ | keep | phrase | bridge the gap | 94 | 3 | 31.33 | 0.04% |
62
+ | keep | phrase | comprehensive approach | 33 | 1 | 33 | 0.01% |
63
+ | keep | phrase | continuous improvement | 28 | 1 | 28 | 0.01% |
64
+ | keep | phrase | ever-changing | 74 | 0 | Infinity | 0.00% |
65
+ | keep | phrase | ever-evolving | 144 | 0 | Infinity | 0.00% |
66
+ | keep | phrase | fast-paced | 53 | 3 | 17.67 | 0.04% |
67
+ | keep | phrase | gain valuable insights | 2 | 0 | Infinity | 0.00% |
68
+ | keep | phrase | glean insights | 3 | 0 | Infinity | 0.00% |
69
+ | keep | phrase | harness the power | 8 | 0 | Infinity | 0.00% |
70
+ | keep | phrase | holistic approach | 128 | 4 | 32 | 0.05% |
71
+ | keep | phrase | in the digital age | 23 | 0 | Infinity | 0.00% |
72
+ | keep | phrase | in the modern era | 7 | 1 | 7 | 0.01% |
73
+ | keep | phrase | in today's | 69 | 17 | 4.06 | 0.21% |
74
+ | keep | phrase | key insights | 4 | 0 | Infinity | 0.00% |
75
+ | keep | phrase | key takeaways | 2 | 0 | Infinity | 0.00% |
76
+ | keep | phrase | pave the path | 4 | 0 | Infinity | 0.00% |
77
+ | keep | phrase | pave the way | 133 | 1 | 133 | 0.01% |
78
+ | keep | phrase | play a crucial role | 75 | 4 | 18.75 | 0.05% |
79
+ | keep | phrase | plays a vital role | 11 | 1 | 11 | 0.01% |
80
+ | keep | phrase | rapidly changing | 42 | 1 | 42 | 0.01% |
81
+ | keep | phrase | rapidly evolving | 32 | 1 | 32 | 0.01% |
82
+ | keep | phrase | realize the potential | 3 | 0 | Infinity | 0.00% |
83
+ | keep | phrase | the bigger picture | 22 | 2 | 11 | 0.02% |
84
+ | keep | phrase | the competitive landscape | 1 | 0 | Infinity | 0.00% |
85
+ | keep | phrase | the digital landscape | 13 | 0 | Infinity | 0.00% |
86
+ | keep | phrase | the future of | 212 | 24 | 8.83 | 0.29% |
87
+ | keep | phrase | the landscape of | 134 | 1 | 134 | 0.01% |
88
+ | keep | phrase | the realm of | 224 | 7 | 32 | 0.08% |
89
+ | keep | phrase | the regulatory landscape | 4 | 0 | Infinity | 0.00% |
90
+ | keep | phrase | the world of | 241 | 42 | 5.74 | 0.51% |
91
+ | keep | phrase | unlock the potential | 6 | 0 | Infinity | 0.00% |
92
+ | keep | phrase | usher in | 37 | 6 | 6.17 | 0.07% |
93
+ | keep | phrase | valuable insights | 124 | 3 | 41.33 | 0.04% |
94
+ | keep | strict | accelerate | 69 | 17 | 4.06 | 0.21% |
95
+ | keep | strict | actionable | 104 | 3 | 34.67 | 0.04% |
96
+ | keep | strict | align | 370 | 17 | 21.76 | 0.21% |
97
+ | keep | strict | alignment | 135 | 23 | 5.87 | 0.28% |
98
+ | keep | strict | amplify | 117 | 5 | 23.4 | 0.06% |
99
+ | keep | strict | bespoke | 69 | 8 | 8.63 | 0.10% |
100
+ | keep | strict | bolster | 175 | 12 | 14.58 | 0.14% |
101
+ | keep | strict | catalyst | 161 | 26 | 6.19 | 0.31% |
102
+ | keep | strict | compelling | 340 | 27 | 12.59 | 0.33% |
103
+ | keep | strict | curated | 106 | 7 | 15.14 | 0.08% |
104
+ | keep | strict | cutting-edge | 165 | 1 | 165 | 0.01% |
105
+ | keep | strict | dynamic | 765 | 110 | 6.95 | 1.33% |
106
+ | keep | strict | ecosystem | 205 | 48 | 4.27 | 0.58% |
107
+ | keep | strict | elevate | 106 | 4 | 26.5 | 0.05% |
108
+ | keep | strict | empower | 142 | 4 | 35.5 | 0.05% |
109
+ | keep | strict | empowering | 166 | 7 | 23.71 | 0.08% |
110
+ | keep | strict | enabling | 263 | 39 | 6.74 | 0.47% |
111
+ | keep | strict | envision | 117 | 9 | 13 | 0.11% |
112
+ | keep | strict | ethical | 259 | 25 | 10.36 | 0.30% |
113
+ | keep | strict | harness | 218 | 14 | 15.57 | 0.17% |
114
+ | keep | strict | impactful | 83 | 3 | 27.67 | 0.04% |
115
+ | keep | strict | inclusive | 205 | 18 | 11.39 | 0.22% |
116
+ | keep | strict | inflection | 12 | 0 | Infinity | 0.00% |
117
+ | keep | strict | meaningful | 305 | 48 | 6.35 | 0.58% |
118
+ | keep | strict | modalities | 61 | 15 | 4.07 | 0.18% |
119
+ | keep | strict | pivot | 184 | 5 | 36.8 | 0.06% |
120
+ | keep | strict | prioritize | 239 | 3 | 79.67 | 0.04% |
121
+ | keep | strict | reimagine | 22 | 0 | Infinity | 0.00% |
122
+ | keep | strict | rethink | 45 | 3 | 15 | 0.04% |
123
+ | keep | strict | scalable | 34 | 5 | 6.8 | 0.06% |
124
+ | keep | strict | seamless | 176 | 4 | 44 | 0.05% |
125
+ | keep | strict | seamlessly | 352 | 9 | 39.11 | 0.11% |
126
+ | keep | strict | skillset | 4 | 0 | Infinity | 0.00% |
127
+ | keep | strict | streamline | 42 | 3 | 14 | 0.04% |
128
+ | keep | strict | streamlined | 26 | 3 | 8.67 | 0.04% |
129
+ | keep | strict | sustainable | 690 | 67 | 10.3 | 0.81% |
130
+ | keep | strict | thoughtful | 228 | 33 | 6.91 | 0.40% |
131
+ | keep | strict | thrive | 279 | 18 | 15.5 | 0.22% |
132
+ | keep | strict | thriving | 137 | 6 | 22.83 | 0.07% |
133
+ | keep | strict | toolkit | 39 | 3 | 13 | 0.04% |
134
+ | keep | strict | transformative | 417 | 5 | 83.4 | 0.06% |
135
+ | keep | strict | unlock | 165 | 15 | 11 | 0.18% |
136
+ | keep | strict | vibrant | 989 | 13 | 76.08 | 0.16% |
@@ -0,0 +1,381 @@
1
+ {
2
+ "schemaVersion": 1,
3
+ "generatedAt": "2026-05-21T18:13:21.576Z",
4
+ "input": "artifacts/rebaseline-2025/rebaseline-2026.scored.public.jsonl",
5
+ "targets": {
6
+ "protocolPerLanguageClassRegister": 25,
7
+ "claimPerCell": 100,
8
+ "claimLanguages": 2,
9
+ "claimGeneratorFamilies": 3
10
+ },
11
+ "totalRecords": 800,
12
+ "byLanguage": {
13
+ "ko": 400,
14
+ "en": 400
15
+ },
16
+ "byClass": {
17
+ "natural-human": 200,
18
+ "ai-like": 600
19
+ },
20
+ "byRegister": {
21
+ "product-doc": 140,
22
+ "academic-summary": 190,
23
+ "chat-update": 140,
24
+ "blog": 190,
25
+ "technical-how-to": 140
26
+ },
27
+ "byModelFamily": {
28
+ "human-reference": 200,
29
+ "claude-family": 200,
30
+ "gemini-family": 200,
31
+ "gpt-family": 200
32
+ },
33
+ "protocolCoverage": {
34
+ "totalCells": 80,
35
+ "populatedCells": 17,
36
+ "emptyCells": 63,
37
+ "cellsMeetingTarget": 12,
38
+ "underfilledCells": [
39
+ {
40
+ "key": "ko|natural-human|blog",
41
+ "count": 20
42
+ },
43
+ {
44
+ "key": "ko|natural-human|academic-summary",
45
+ "count": 20
46
+ },
47
+ {
48
+ "key": "ko|natural-human|product-doc",
49
+ "count": 20
50
+ },
51
+ {
52
+ "key": "ko|natural-human|chat-update",
53
+ "count": 20
54
+ },
55
+ {
56
+ "key": "ko|natural-human|technical-how-to",
57
+ "count": 20
58
+ }
59
+ ]
60
+ },
61
+ "claimGate": {
62
+ "ready": true,
63
+ "blockers": [],
64
+ "qualifiedPositiveCells": [
65
+ {
66
+ "key": "en|claude-family",
67
+ "count": 100
68
+ },
69
+ {
70
+ "key": "en|gemini-family",
71
+ "count": 100
72
+ },
73
+ {
74
+ "key": "en|gpt-family",
75
+ "count": 100
76
+ },
77
+ {
78
+ "key": "ko|claude-family",
79
+ "count": 100
80
+ },
81
+ {
82
+ "key": "ko|gemini-family",
83
+ "count": 100
84
+ },
85
+ {
86
+ "key": "ko|gpt-family",
87
+ "count": 100
88
+ }
89
+ ],
90
+ "qualifiedNaturalCells": [
91
+ {
92
+ "key": "ko",
93
+ "count": 100
94
+ },
95
+ {
96
+ "key": "en",
97
+ "count": 100
98
+ }
99
+ ]
100
+ },
101
+ "metrics": {
102
+ "tp": 404,
103
+ "fp": 32,
104
+ "fn": 196,
105
+ "tn": 168,
106
+ "total": 800,
107
+ "accuracy": 0.715,
108
+ "precision": 0.927,
109
+ "recall": 0.673,
110
+ "f1": 0.78,
111
+ "falsePositiveRate": 0.16,
112
+ "falseNegativeRate": 0.327,
113
+ "accuracyCi": {
114
+ "low": 0.683,
115
+ "high": 0.745,
116
+ "method": "Wilson score interval, 95%"
117
+ },
118
+ "recallCi": {
119
+ "low": 0.635,
120
+ "high": 0.71,
121
+ "method": "Wilson score interval, 95%"
122
+ },
123
+ "falsePositiveRateCi": {
124
+ "low": 0.116,
125
+ "high": 0.217,
126
+ "method": "Wilson score interval, 95%"
127
+ }
128
+ },
129
+ "catchByLanguageFamily": {
130
+ "en|claude-family": {
131
+ "language": "en",
132
+ "modelFamily": "claude-family",
133
+ "n": 100,
134
+ "caught": 74,
135
+ "missed": 26,
136
+ "catchRate": 0.74,
137
+ "catchRateCi": {
138
+ "low": 0.646,
139
+ "high": 0.816,
140
+ "method": "Wilson score interval, 95%"
141
+ }
142
+ },
143
+ "en|gemini-family": {
144
+ "language": "en",
145
+ "modelFamily": "gemini-family",
146
+ "n": 100,
147
+ "caught": 79,
148
+ "missed": 21,
149
+ "catchRate": 0.79,
150
+ "catchRateCi": {
151
+ "low": 0.7,
152
+ "high": 0.858,
153
+ "method": "Wilson score interval, 95%"
154
+ }
155
+ },
156
+ "en|gpt-family": {
157
+ "language": "en",
158
+ "modelFamily": "gpt-family",
159
+ "n": 100,
160
+ "caught": 77,
161
+ "missed": 23,
162
+ "catchRate": 0.77,
163
+ "catchRateCi": {
164
+ "low": 0.678,
165
+ "high": 0.842,
166
+ "method": "Wilson score interval, 95%"
167
+ }
168
+ },
169
+ "ko|claude-family": {
170
+ "language": "ko",
171
+ "modelFamily": "claude-family",
172
+ "n": 100,
173
+ "caught": 68,
174
+ "missed": 32,
175
+ "catchRate": 0.68,
176
+ "catchRateCi": {
177
+ "low": 0.583,
178
+ "high": 0.763,
179
+ "method": "Wilson score interval, 95%"
180
+ }
181
+ },
182
+ "ko|gemini-family": {
183
+ "language": "ko",
184
+ "modelFamily": "gemini-family",
185
+ "n": 100,
186
+ "caught": 62,
187
+ "missed": 38,
188
+ "catchRate": 0.62,
189
+ "catchRateCi": {
190
+ "low": 0.522,
191
+ "high": 0.709,
192
+ "method": "Wilson score interval, 95%"
193
+ }
194
+ },
195
+ "ko|gpt-family": {
196
+ "language": "ko",
197
+ "modelFamily": "gpt-family",
198
+ "n": 100,
199
+ "caught": 44,
200
+ "missed": 56,
201
+ "catchRate": 0.44,
202
+ "catchRateCi": {
203
+ "low": 0.347,
204
+ "high": 0.538,
205
+ "method": "Wilson score interval, 95%"
206
+ }
207
+ }
208
+ },
209
+ "falsePositiveByLanguage": {
210
+ "en": {
211
+ "language": "en",
212
+ "n": 100,
213
+ "falsePositives": 14,
214
+ "trueNegatives": 86,
215
+ "falsePositiveRate": 0.14,
216
+ "falsePositiveRateCi": {
217
+ "low": 0.085,
218
+ "high": 0.221,
219
+ "method": "Wilson score interval, 95%"
220
+ }
221
+ },
222
+ "ko": {
223
+ "language": "ko",
224
+ "n": 100,
225
+ "falsePositives": 18,
226
+ "trueNegatives": 82,
227
+ "falsePositiveRate": 0.18,
228
+ "falsePositiveRateCi": {
229
+ "low": 0.117,
230
+ "high": 0.267,
231
+ "method": "Wilson score interval, 95%"
232
+ }
233
+ }
234
+ },
235
+ "metricsByRegister": {
236
+ "academic-summary": {
237
+ "tp": 89,
238
+ "fp": 18,
239
+ "fn": 31,
240
+ "tn": 52,
241
+ "total": 190,
242
+ "accuracy": 0.742,
243
+ "precision": 0.832,
244
+ "recall": 0.742,
245
+ "f1": 0.784,
246
+ "falsePositiveRate": 0.257,
247
+ "falseNegativeRate": 0.258,
248
+ "accuracyCi": {
249
+ "low": 0.676,
250
+ "high": 0.799,
251
+ "method": "Wilson score interval, 95%"
252
+ },
253
+ "recallCi": {
254
+ "low": 0.657,
255
+ "high": 0.812,
256
+ "method": "Wilson score interval, 95%"
257
+ },
258
+ "falsePositiveRateCi": {
259
+ "low": 0.169,
260
+ "high": 0.37,
261
+ "method": "Wilson score interval, 95%"
262
+ }
263
+ },
264
+ "blog": {
265
+ "tp": 70,
266
+ "fp": 6,
267
+ "fn": 50,
268
+ "tn": 64,
269
+ "total": 190,
270
+ "accuracy": 0.705,
271
+ "precision": 0.921,
272
+ "recall": 0.583,
273
+ "f1": 0.714,
274
+ "falsePositiveRate": 0.086,
275
+ "falseNegativeRate": 0.417,
276
+ "accuracyCi": {
277
+ "low": 0.637,
278
+ "high": 0.766,
279
+ "method": "Wilson score interval, 95%"
280
+ },
281
+ "recallCi": {
282
+ "low": 0.494,
283
+ "high": 0.668,
284
+ "method": "Wilson score interval, 95%"
285
+ },
286
+ "falsePositiveRateCi": {
287
+ "low": 0.04,
288
+ "high": 0.175,
289
+ "method": "Wilson score interval, 95%"
290
+ }
291
+ },
292
+ "chat-update": {
293
+ "tp": 61,
294
+ "fp": 0,
295
+ "fn": 59,
296
+ "tn": 20,
297
+ "total": 140,
298
+ "accuracy": 0.579,
299
+ "precision": 1,
300
+ "recall": 0.508,
301
+ "f1": 0.674,
302
+ "falsePositiveRate": 0,
303
+ "falseNegativeRate": 0.492,
304
+ "accuracyCi": {
305
+ "low": 0.496,
306
+ "high": 0.657,
307
+ "method": "Wilson score interval, 95%"
308
+ },
309
+ "recallCi": {
310
+ "low": 0.42,
311
+ "high": 0.596,
312
+ "method": "Wilson score interval, 95%"
313
+ },
314
+ "falsePositiveRateCi": {
315
+ "low": 0,
316
+ "high": 0.161,
317
+ "method": "Wilson score interval, 95%"
318
+ }
319
+ },
320
+ "product-doc": {
321
+ "tp": 94,
322
+ "fp": 2,
323
+ "fn": 26,
324
+ "tn": 18,
325
+ "total": 140,
326
+ "accuracy": 0.8,
327
+ "precision": 0.979,
328
+ "recall": 0.783,
329
+ "f1": 0.87,
330
+ "falsePositiveRate": 0.1,
331
+ "falseNegativeRate": 0.217,
332
+ "accuracyCi": {
333
+ "low": 0.726,
334
+ "high": 0.858,
335
+ "method": "Wilson score interval, 95%"
336
+ },
337
+ "recallCi": {
338
+ "low": 0.701,
339
+ "high": 0.848,
340
+ "method": "Wilson score interval, 95%"
341
+ },
342
+ "falsePositiveRateCi": {
343
+ "low": 0.028,
344
+ "high": 0.301,
345
+ "method": "Wilson score interval, 95%"
346
+ }
347
+ },
348
+ "technical-how-to": {
349
+ "tp": 90,
350
+ "fp": 6,
351
+ "fn": 30,
352
+ "tn": 14,
353
+ "total": 140,
354
+ "accuracy": 0.743,
355
+ "precision": 0.938,
356
+ "recall": 0.75,
357
+ "f1": 0.833,
358
+ "falsePositiveRate": 0.3,
359
+ "falseNegativeRate": 0.25,
360
+ "accuracyCi": {
361
+ "low": 0.665,
362
+ "high": 0.808,
363
+ "method": "Wilson score interval, 95%"
364
+ },
365
+ "recallCi": {
366
+ "low": 0.666,
367
+ "high": 0.819,
368
+ "method": "Wilson score interval, 95%"
369
+ },
370
+ "falsePositiveRateCi": {
371
+ "low": 0.145,
372
+ "high": 0.519,
373
+ "method": "Wilson score interval, 95%"
374
+ }
375
+ }
376
+ },
377
+ "validation": {
378
+ "errors": [],
379
+ "warnings": []
380
+ }
381
+ }
@@ -0,0 +1,121 @@
1
+ # Rebaseline Manifest Summary
2
+
3
+ - Generated at: 2026-05-21T18:13:21.576Z
4
+ - Input: `artifacts/rebaseline-2025/rebaseline-2026.scored.public.jsonl`
5
+ - Records: 800
6
+ - Protocol target: 25 samples per language × class × register cell
7
+ - Public claim target: 100 samples per claim cell, 2+ languages, 3+ generator families
8
+
9
+ ## Validation
10
+
11
+ Validation: **PASS**
12
+
13
+ ## Coverage snapshot
14
+
15
+ ### By language
16
+
17
+ | value | n |
18
+ |---|---:|
19
+ | ko | 400 |
20
+ | en | 400 |
21
+ | zh | 0 |
22
+ | ja | 0 |
23
+
24
+ ### By class
25
+
26
+ | value | n |
27
+ |---|---:|
28
+ | ai-like | 600 |
29
+ | natural-human | 200 |
30
+ | lightly-edited-ai | 0 |
31
+ | heavily-edited-ai | 0 |
32
+
33
+ ### By register
34
+
35
+ | value | n |
36
+ |---|---:|
37
+ | blog | 190 |
38
+ | academic-summary | 190 |
39
+ | product-doc | 140 |
40
+ | chat-update | 140 |
41
+ | technical-how-to | 140 |
42
+
43
+ ### By model family
44
+
45
+ | value | n |
46
+ |---|---:|
47
+ | gpt-family | 200 |
48
+ | claude-family | 200 |
49
+ | gemini-family | 200 |
50
+ | open-weight | 0 |
51
+ | human-reference | 200 |
52
+
53
+ ## Protocol matrix
54
+
55
+ - Populated language × class × register cells: 17/80
56
+ - Cells meeting 25+ samples: 12
57
+ - Empty cells: 63
58
+ - Underfilled populated cells: 5
59
+
60
+ | cell | n |
61
+ |---|---:|
62
+ | ko × natural-human × blog | 20 |
63
+ | ko × natural-human × academic-summary | 20 |
64
+ | ko × natural-human × product-doc | 20 |
65
+ | ko × natural-human × chat-update | 20 |
66
+ | ko × natural-human × technical-how-to | 20 |
67
+
68
+ ## Public performance claim gate
69
+
70
+ Public performance claim: **READY**
71
+
72
+ Gate conditions met by this manifest.
73
+
74
+ | claim-gate count | value |
75
+ |---|---:|
76
+ | qualified positive cells (language × generator family, n≥100) | 6 |
77
+ | qualified natural-language cells (language, n≥100) | 2 |
78
+ | outcome rows with expected/predicted labels | 800 |
79
+
80
+ ## Outcome metrics
81
+
82
+ | metric | value |
83
+ |---|---:|
84
+ | accuracy | 71.5% |
85
+ | accuracy CI | 68.3%–74.5% |
86
+ | precision | 92.7% |
87
+ | recall | 67.3% |
88
+ | recall CI | 63.5%–71.0% |
89
+ | F1 | 0.780 |
90
+ | false positive rate | 16.0% |
91
+ | false positive rate CI | 11.6%–21.7% |
92
+ | false negative rate | 32.7% |
93
+ | TP/FP/FN/TN | 404/32/196/168 |
94
+
95
+ ### Catch rate by language × model family
96
+
97
+ | language | model family | n | catch rate | 95% CI | caught/missed |
98
+ |---|---|---:|---:|---:|---:|
99
+ | en | claude-family | 100 | 74.0% | 64.6%–81.6% | 74/26 |
100
+ | en | gemini-family | 100 | 79.0% | 70.0%–85.8% | 79/21 |
101
+ | en | gpt-family | 100 | 77.0% | 67.8%–84.2% | 77/23 |
102
+ | ko | claude-family | 100 | 68.0% | 58.3%–76.3% | 68/32 |
103
+ | ko | gemini-family | 100 | 62.0% | 52.2%–70.9% | 62/38 |
104
+ | ko | gpt-family | 100 | 44.0% | 34.7%–53.8% | 44/56 |
105
+
106
+ ### False-positive rate by language
107
+
108
+ | language | n | false-positive rate | 95% CI | FP/TN |
109
+ |---|---:|---:|---:|---:|
110
+ | en | 100 | 14.0% | 8.5%–22.1% | 14/86 |
111
+ | ko | 100 | 18.0% | 11.7%–26.7% | 18/82 |
112
+
113
+ ### By register
114
+
115
+ | register | n | FP rate | FN rate | TP/FP/FN/TN |
116
+ |---|---:|---:|---:|---:|
117
+ | blog | 190 | 8.6% | 41.7% | 70/6/50/64 |
118
+ | academic-summary | 190 | 25.7% | 25.8% | 89/18/31/52 |
119
+ | product-doc | 140 | 10.0% | 21.7% | 94/2/26/18 |
120
+ | chat-update | 140 | 0.0% | 49.2% | 61/0/59/20 |
121
+ | technical-how-to | 140 | 30.0% | 25.0% | 90/6/30/14 |