patina-cli 3.11.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (193) hide show
  1. package/.patina.default.yaml +29 -29
  2. package/CHANGELOG.md +53 -0
  3. package/NOTICE +21 -0
  4. package/README.md +117 -224
  5. package/README_JA.md +134 -77
  6. package/README_KR.md +132 -74
  7. package/README_ZH.md +137 -80
  8. package/SKILL.md +11 -20
  9. package/artifacts/rebaseline-2025/README.md +147 -0
  10. package/artifacts/rebaseline-2025/human-controls.public.jsonl +250 -0
  11. package/artifacts/rebaseline-2025/intake.example.jsonl +2 -0
  12. package/artifacts/rebaseline-2025/intake.local.example.jsonl +25 -0
  13. package/artifacts/rebaseline-2025/prompts.template.jsonl +7 -0
  14. package/artifacts/rebaseline-2025/sources.ko-public.jsonl +39 -0
  15. package/assets/brand/patina-badge.svg +18 -0
  16. package/assets/brand/patina-mark.svg +8 -0
  17. package/assets/demo/README.md +79 -0
  18. package/core/scoring.md +12 -12
  19. package/core/standalone-prompt.md +3 -1
  20. package/core/stylometry.md +93 -22
  21. package/docs/API.md +1554 -0
  22. package/docs/AUTHENTICATION.md +50 -26
  23. package/docs/AUTHENTICATION_KR.md +54 -29
  24. package/docs/BRANDING.md +9 -8
  25. package/docs/CLI.md +55 -14
  26. package/docs/COOKBOOK.md +8 -21
  27. package/docs/DEMO.md +32 -5
  28. package/docs/EXIT-CODES.md +2 -3
  29. package/docs/FALSE-POSITIVES.md +63 -0
  30. package/docs/FAQ.md +9 -1
  31. package/docs/FAQ_KR.md +3 -1
  32. package/docs/FLAG-PARITY.md +33 -47
  33. package/docs/ISSUE-WAVES.md +57 -0
  34. package/docs/PATTERNS-EN.md +67 -3
  35. package/docs/PATTERNS-JA.md +68 -2
  36. package/docs/PATTERNS-KO.md +70 -7
  37. package/docs/PATTERNS-ZH.md +67 -3
  38. package/docs/PATTERNS.md +5 -5
  39. package/docs/RESEARCH-DOCS-PLATFORM.md +54 -0
  40. package/docs/ROADMAP.md +46 -66
  41. package/docs/TRANSLATIONESE-KO.md +51 -0
  42. package/docs/audits/2026-05-deep-research.md +3 -1
  43. package/docs/benchmarks/README.md +51 -0
  44. package/docs/benchmarks/detector-comparison.json +69 -9
  45. package/docs/benchmarks/detector-comparison.md +10 -5
  46. package/docs/benchmarks/katfish-ko-latest.json +657 -0
  47. package/docs/benchmarks/katfish-ko-latest.md +77 -0
  48. package/docs/benchmarks/latest.json +1183 -108
  49. package/docs/benchmarks/latest.md +84 -60
  50. package/docs/benchmarks/lexicon-freshness-en-2026-05-22.json +1121 -0
  51. package/docs/benchmarks/lexicon-freshness-en-2026-05-22.md +136 -0
  52. package/docs/benchmarks/rebaseline-latest.json +381 -0
  53. package/docs/benchmarks/rebaseline-latest.md +121 -0
  54. package/docs/benchmarks/register-stratified-latest.json +164 -0
  55. package/docs/benchmarks/register-stratified-latest.md +99 -0
  56. package/docs/benchmarks/register-stratified.md +43 -0
  57. package/docs/integrations/github-action.md +44 -11
  58. package/docs/integrations/playground.md +58 -0
  59. package/docs/integrations/pre-commit.md +5 -5
  60. package/docs/integrations/release.md +5 -3
  61. package/docs/integrations/static-sites.md +83 -0
  62. package/docs/research/2025-rebaseline-plan.md +71 -2
  63. package/docs/research/2026-rebaseline.md +102 -0
  64. package/docs/research/adversarial-mps.md +41 -0
  65. package/docs/research/ai-human-metrics.md +35 -23
  66. package/docs/research/human-eval-panel.md +42 -0
  67. package/docs/research/judge-agreement.md +24 -0
  68. package/docs/research/ko-2025-corpus-sources.md +135 -0
  69. package/docs/research/lexicon-freshness-audit.md +64 -0
  70. package/docs/research/zh-ja-lexicon-calibration.md +60 -0
  71. package/docs/social/patina-launch-copy.md +173 -100
  72. package/docs/social/patina-launch-execution.md +94 -0
  73. package/docs/social/patina-launch-korean-first.md +83 -0
  74. package/docs/social/signs-of-ai-writing.md +26 -0
  75. package/docs/social/signs-of-ai-writing_KR.md +26 -0
  76. package/lexicon/ai-en.md +21 -24
  77. package/lexicon/ai-ja.md +158 -0
  78. package/lexicon/ai-ko.md +9 -9
  79. package/lexicon/ai-zh.md +158 -0
  80. package/lexicon/provenance/ai-en.json +970 -0
  81. package/lexicon/provenance/ai-ja.json +542 -0
  82. package/lexicon/provenance/ai-ko.json +866 -0
  83. package/lexicon/provenance/ai-zh.json +542 -0
  84. package/package.json +49 -8
  85. package/patterns/en-communication.md +5 -0
  86. package/patterns/en-content.md +5 -0
  87. package/patterns/en-filler.md +5 -0
  88. package/patterns/en-language.md +29 -1
  89. package/patterns/en-structure.md +5 -0
  90. package/patterns/en-style.md +5 -0
  91. package/patterns/en-viral-hook.md +42 -2
  92. package/patterns/ja-communication.md +5 -0
  93. package/patterns/ja-content.md +5 -0
  94. package/patterns/ja-filler.md +5 -0
  95. package/patterns/ja-language.md +33 -1
  96. package/patterns/ja-structure.md +12 -0
  97. package/patterns/ja-style.md +5 -0
  98. package/patterns/ja-viral-hook.md +41 -2
  99. package/patterns/ko-communication.md +5 -0
  100. package/patterns/ko-content.md +5 -0
  101. package/patterns/ko-filler.md +5 -0
  102. package/patterns/ko-language.md +33 -1
  103. package/patterns/ko-structure.md +25 -6
  104. package/patterns/ko-style.md +5 -0
  105. package/patterns/ko-viral-hook.md +38 -2
  106. package/patterns/zh-communication.md +5 -0
  107. package/patterns/zh-content.md +5 -0
  108. package/patterns/zh-filler.md +5 -0
  109. package/patterns/zh-language.md +37 -1
  110. package/patterns/zh-structure.md +12 -0
  111. package/patterns/zh-style.md +5 -0
  112. package/patterns/zh-viral-hook.md +38 -2
  113. package/playground/README.md +55 -0
  114. package/playground/analytics.js +4 -0
  115. package/playground/analyzer.js +883 -0
  116. package/playground/app.js +157 -0
  117. package/playground/data/lexicons.js +343 -0
  118. package/playground/index.html +138 -0
  119. package/playground/styles.css +267 -0
  120. package/profiles/namuwiki.md +111 -0
  121. package/scripts/adversarial-mps-report.mjs +201 -0
  122. package/scripts/badge-json.mjs +79 -0
  123. package/scripts/benchmark-report.mjs +56 -9
  124. package/scripts/check-release-metadata.mjs +0 -2
  125. package/scripts/detector-comparison.mjs +7 -7
  126. package/scripts/generate-playground-data.mjs +77 -0
  127. package/scripts/katfish-calibration.mjs +464 -0
  128. package/scripts/lexicon-freshness.mjs +485 -0
  129. package/scripts/lint.mjs +1 -1
  130. package/scripts/precommit-score.mjs +4 -3
  131. package/scripts/prose-score.mjs +81 -5
  132. package/scripts/rebaseline-intake.mjs +242 -0
  133. package/scripts/rebaseline-score.mjs +268 -0
  134. package/scripts/rebaseline-summary.mjs +773 -0
  135. package/scripts/rebaseline-web-collect.mjs +410 -0
  136. package/scripts/update-benchmark-ranges.mjs +1 -0
  137. package/src/api.js +69 -105
  138. package/src/auth.js +50 -2
  139. package/src/backends/claude-cli.js +19 -4
  140. package/src/backends/codex-cli.js +19 -3
  141. package/src/backends/contract.js +230 -1
  142. package/src/backends/gemini-cli.js +18 -5
  143. package/src/backends/index.js +87 -12
  144. package/src/backends/kimi-cli.js +161 -0
  145. package/src/cli.js +577 -567
  146. package/src/commands/doctor.js +2 -2
  147. package/src/config.js +29 -0
  148. package/src/errors.js +53 -1
  149. package/src/features/discourse-tells.js +68 -0
  150. package/src/features/index.js +82 -8
  151. package/src/features/lexicon.js +40 -6
  152. package/src/features/markup-leakage.js +69 -0
  153. package/src/features/segment.js +41 -0
  154. package/src/features/signal-strength.js +81 -0
  155. package/src/features/stylometry.js +231 -1
  156. package/src/features/translationese.js +127 -0
  157. package/src/loader.js +76 -0
  158. package/src/logger.js +22 -23
  159. package/src/model-defaults.js +55 -0
  160. package/src/ouroboros.js +31 -0
  161. package/src/output.js +102 -90
  162. package/src/prompt-builder.js +103 -68
  163. package/src/providers.js +51 -4
  164. package/src/scoring.js +210 -2
  165. package/src/security.js +75 -0
  166. package/tests/fixtures/live-quality/en/public-docs-01.md +26 -0
  167. package/tests/fixtures/live-quality/ko/public-docs-01.md +26 -0
  168. package/tests/fixtures/suspect-zones/expected-ranges.json +207 -16
  169. package/tests/fixtures/suspect-zones/ja/ai/ja-ai-04-lexicon.md +11 -0
  170. package/tests/fixtures/suspect-zones/ja/natural/ja-nat-04-lexicon-cold.md +11 -0
  171. package/tests/fixtures/suspect-zones/ko/ai/ko-ai-02.md +4 -5
  172. package/tests/fixtures/suspect-zones/ko/ai/ko-ai-07-ko-diagnostic.md +11 -0
  173. package/tests/fixtures/suspect-zones/zh/ai/zh-ai-04-lexicon.md +11 -0
  174. package/tests/fixtures/suspect-zones/zh/natural/zh-nat-04-lexicon-cold.md +11 -0
  175. package/tests/quality/README.md +188 -11
  176. package/tests/quality/adversarial-mps/fixtures.jsonl +10 -0
  177. package/tests/quality/benchmark.mjs +39 -1
  178. package/tests/quality/dogfood.mjs +5 -3
  179. package/tests/quality/live-fixtures.jsonl +2 -0
  180. package/tests/quality/live-quality.mjs +596 -0
  181. package/tests/quality/ranking-metrics.mjs +136 -0
  182. package/tests/quality/rebaseline-manifest.example.jsonl +5 -0
  183. package/vercel.json +53 -0
  184. package/SKILL-MAX.md +0 -455
  185. package/docs/internal/HARNESS.md +0 -14
  186. package/docs/internal/README.md +0 -14
  187. package/docs/internal/WARP.md +0 -23
  188. package/patina-max/SKILL.md +0 -523
  189. package/patina-max/composite.py +0 -457
  190. package/src/cache.js +0 -106
  191. package/src/commands/init.js +0 -208
  192. package/src/manifest.js +0 -162
  193. package/src/max-mode.js +0 -207
@@ -0,0 +1,2 @@
1
+ {"sample_id":"ko-public-blog-ai-001","language":"ko","class":"ai-like","register":"blog","model_family":"gpt-family","provider":"fixture","model":"fixture-gpt-family-2026-05","generated_at":"2026-05-21","prompt_id":"rb25-ko-blog-001","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"repo-ok","text":"이 가이드는 팀이 더 명확한 메시지를 만들 수 있도록 단계별 방법을 제공합니다."}
2
+ {"sample_id":"ko-public-productdoc-human-001","language":"ko","class":"natural-human","register":"product-doc","model_family":"human-reference","provider":"fixture","model":"maintainer-written-reference","generated_at":"2026-05-21","prompt_id":"rb25-ko-product-doc-001","decoding":"not-applicable","postprocess":{"editing_pass":"maintainer copyedit"},"redistribution":"repo-ok","text":"설정값은 프로젝트 파일에서 먼저 읽는다. 값이 없을 때만 기본값을 쓴다."}
@@ -0,0 +1,25 @@
1
+ {"sample_id":"ko-human-control-01","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:eed8a722a97e18c4f151faa694c8126e85bab85ec54d9d9f8b924fc7bb4f9c7a","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"academic-summary","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-academic-da-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
2
+ {"sample_id":"ko-human-control-02","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:58902326131c45cd5a44f65da5e8eaa71e150f350d18827ada9bc82fd7dc8a0c","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"academic-summary","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-academic-da-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
3
+ {"sample_id":"ko-human-control-03","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:3b970b4c6f789ce5e18d5adb74c0175a1cc595ee69bc7cc4426cbf7fd14b0bc1","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"product-doc","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-product-doc-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
4
+ {"sample_id":"ko-human-control-04","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:4bad105fb55795f507737ac0ab5092476442cb0f5a98993d8a0709e2103da73d","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"product-doc","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-product-doc-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
5
+ {"sample_id":"ko-human-control-05","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:f13ef06a1e276cd9257f06411206eb3c55db0fa96cc0d373c0d234d7277d5375","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"blog","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-blog-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
6
+ {"sample_id":"ko-human-control-06","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:b171223e21ee621901f4b3478e70ddf8d8624fdc71376f200605c4b08b5a8556","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"blog","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-blog-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
7
+ {"sample_id":"ko-human-control-07","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:cee3cb6aa869a48288b2c513fcc9cc8c9bbf5b6cc3b4e2c4a78cf2e495f3d797","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"chat-update","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-chat-update-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
8
+ {"sample_id":"ko-human-control-08","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:b60f46512232af2bf08ae38f45c1ca364cf6869b4423a94cbf633fede042db94","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"natural-human","register":"chat-update","model_family":"human-reference","provider":"local-human-control","model":"maintainer-or-approved-human-source","prompt_id":"rb25-ko-chat-update-001","reviewer_notes":"Native Korean control candidate; keep raw text local unless explicit fixture permission exists."}
9
+ {"sample_id":"ko-ai-01","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:e7fb5c91d8e41e685afd24fd43be0ae329c625316b4362614461ad9d97cffe64","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"blog","model_family":"gpt-family","provider":"openai","model":"replace-with-current-gpt-model","prompt_id":"rb25-ko-blog-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
10
+ {"sample_id":"ko-ai-02","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:8cb64dabc655a9b8c1259ef928169ddb44cb5804c1450970688a162a6ee34fa2","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"product-doc","model_family":"gpt-family","provider":"openai","model":"replace-with-current-gpt-model","prompt_id":"rb25-ko-product-doc-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
11
+ {"sample_id":"ko-ai-03","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:aa79ea5db563218758b675447eecfc08b0802aeb986ef8a52ae36ccd637643c7","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"blog","model_family":"claude-family","provider":"anthropic","model":"replace-with-current-claude-model","prompt_id":"rb25-ko-blog-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
12
+ {"sample_id":"ko-ai-04","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:d913797c36a6d0cbcaf3b73e4f05bd2bcf193447f36538072ef86fdb8c51d79c","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"product-doc","model_family":"claude-family","provider":"anthropic","model":"replace-with-current-claude-model","prompt_id":"rb25-ko-product-doc-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
13
+ {"sample_id":"ko-ai-05","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:f3f9caeb75ea6cbb01bc69fd9b54f88fe4205eb8423e5892526f141467a1b16b","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"blog","model_family":"gemini-family","provider":"google","model":"replace-with-current-gemini-model","prompt_id":"rb25-ko-blog-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
14
+ {"sample_id":"ko-ai-06","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:f2a6d0b0f419572aa3e026a3128b349e80246bdb983220838aeb0791c35e477d","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"product-doc","model_family":"gemini-family","provider":"google","model":"replace-with-current-gemini-model","prompt_id":"rb25-ko-product-doc-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
15
+ {"sample_id":"ko-ai-07","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:6fc0a933e288ab1947152b67403a0b43848b9237a8c0e6fc6c8923913c181569","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"blog","model_family":"open-weight","provider":"local-or-hosted-open-weight","model":"replace-with-open-weight-model","prompt_id":"rb25-ko-blog-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
16
+ {"sample_id":"ko-ai-08","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:48570564c3672a6e02cf459fc9adb4c7bf1c5f159f70d11dcb02e0e9f96627e7","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"ai-like","register":"product-doc","model_family":"open-weight","provider":"local-or-hosted-open-weight","model":"replace-with-open-weight-model","prompt_id":"rb25-ko-product-doc-001","reviewer_notes":"Generated from repo-owned prompt; confirm provider redistribution terms before committing text."}
17
+ {"sample_id":"ko-lightedit-01","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"light human edit"},"redistribution":"hash-only","text_hash":"sha256:1bd8d748521cd2abf1f164b38f0676f9596e729f9a6d5ff6b0cd0d113bfd3937","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"lightly-edited-ai","register":"product-doc","model_family":"gpt-family","provider":"openai","model":"replace-with-gpt-family-model","prompt_id":"rb25-ko-edit-light-001","reviewer_notes":"Store before/after hashes locally; do not use for public claims until paired evidence is complete."}
18
+ {"sample_id":"ko-lightedit-02","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"light human edit"},"redistribution":"hash-only","text_hash":"sha256:e8b8aaa4f8570df6d89019f84bb0fea043b580907e63f0c28011c105ed307d64","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"lightly-edited-ai","register":"chat-update","model_family":"claude-family","provider":"anthropic","model":"replace-with-claude-family-model","prompt_id":"rb25-ko-edit-light-001","reviewer_notes":"Store before/after hashes locally; do not use for public claims until paired evidence is complete."}
19
+ {"sample_id":"ko-heavyedit-01","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"heavy human edit"},"redistribution":"hash-only","text_hash":"sha256:f9eece5ccfadfb883fce879ceddad5a2cfa04b636e8712c1c4d9036e34d7adc4","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"heavily-edited-ai","register":"blog","model_family":"gemini-family","provider":"google","model":"replace-with-gemini-family-model","prompt_id":"rb25-ko-edit-heavy-001","reviewer_notes":"Store before/after hashes locally; do not use for public claims until paired evidence is complete."}
20
+ {"sample_id":"ko-heavyedit-02","language":"ko","generated_at":"2026-05-21","decoding":{"temperature":0.7,"top_p":0.9},"postprocess":{"editing_pass":"heavy human edit"},"redistribution":"hash-only","text_hash":"sha256:dde36e9ecaf2b770f6180012477f0cdf124be010fe0ace86f2e5e333577022d3","source_review":{"status":"template-placeholder","rationale":"Replace text/hash locally before using as evidence; do not commit raw text unless redistribution is cleared."},"class":"heavily-edited-ai","register":"technical-how-to","model_family":"open-weight","provider":"local-or-hosted-open-weight","model":"replace-with-open-weight-model","prompt_id":"rb25-ko-edit-heavy-001","reviewer_notes":"Store before/after hashes locally; do not use for public claims until paired evidence is complete."}
21
+ {"sample_id":"ko-katfish-essay-01","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:4002d6936d149b24285e9c2965c5b5c567f9a782e34588ebe24812eeed2606cb","source_review":{"status":"license-review","rationale":"Use hash-only until KatFish redistribution terms are recorded for this repo."},"class":"ai-like","register":"academic-summary","model_family":"open-weight","provider":"katfishnet","model":"katfish-essay-source","prompt_id":"katfish-essay-row-id-replace-me","source_url":"https://github.com/Shinwoo-Park/katfishnet","reviewer_notes":"KatFish essay comparison row; replace hash with local row digest."}
22
+ {"sample_id":"ko-katfish-poetry-01","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:fcdd62c1d162426742044a7334341d9513cee1eb657b2eb4a37d500224ac3982","source_review":{"status":"license-review","rationale":"Use hash-only until KatFish redistribution terms are recorded for this repo."},"class":"ai-like","register":"blog","model_family":"open-weight","provider":"katfishnet","model":"katfish-poetry-source","prompt_id":"katfish-poetry-row-id-replace-me","source_url":"https://github.com/Shinwoo-Park/katfishnet","reviewer_notes":"KatFish poetry comparison row; replace hash with local row digest."}
23
+ {"sample_id":"ko-katfish-abstract-01","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:70cdc34465d20b4c18a623fb222e5382a7807c690f534bb593ad52ffcba03d28","source_review":{"status":"license-review","rationale":"Use hash-only until KatFish redistribution terms are recorded for this repo."},"class":"ai-like","register":"academic-summary","model_family":"open-weight","provider":"katfishnet","model":"katfish-paper-abstract-source","prompt_id":"katfish-paper-abstract-row-id-replace-me","source_url":"https://github.com/Shinwoo-Park/katfishnet","reviewer_notes":"KatFish paper-abstract comparison row; replace hash with local row digest."}
24
+ {"sample_id":"ko-fp-learner-01","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:74c4bb285a763e1de64f8490bdc0e51299b6b02b5d636f7c53d7c169f96e844a","source_review":{"status":"hash-only","rationale":"Learner-writing stress row mapped to academic-summary until register schema expands."},"class":"natural-human","register":"academic-summary","model_family":"human-reference","provider":"learner-corpus","model":"human-reference","prompt_id":"learner-corpus-replace-me","reviewer_notes":"Learner-writing stress row mapped to academic-summary until register schema expands."}
25
+ {"sample_id":"ko-fp-community-01","language":"ko","generated_at":"2026-05-21","decoding":"not-applicable","postprocess":{"editing_pass":"none"},"redistribution":"hash-only","text_hash":"sha256:a8c399142311d495e36904d801875afbb7d8d51fef9c350813376f2414ef02e5","source_review":{"status":"hash-only","rationale":"Community false-positive row; use only with explicit fixture permission."},"class":"natural-human","register":"chat-update","model_family":"human-reference","provider":"false-positive-form","model":"human-reference","prompt_id":"false-positive-form-replace-me","reviewer_notes":"Community false-positive row; use only with explicit fixture permission."}
@@ -0,0 +1,7 @@
1
+ {"prompt_id":"rb25-ko-academic-da-001","language":"ko","register":"academic-summary","purpose":"native-human-control or model generation anchor","prompt":"한국어 학술 요약체(종결-다)로, 특정 연구 결과를 120~180자 한 문단으로 요약한다. 과장된 표현 없이 방법, 결과, 한계를 모두 포함한다."}
2
+ {"prompt_id":"rb25-ko-product-doc-001","language":"ko","register":"product-doc","purpose":"native-human-control or model generation anchor","prompt":"개발자 문서의 기능 설명처럼 120~180자 한 문단을 작성한다. 설정값, 예외 조건, 사용자가 확인할 항목을 구체적으로 쓴다."}
3
+ {"prompt_id":"rb25-ko-blog-001","language":"ko","register":"blog","purpose":"model generation anchor","prompt":"개인 블로그 톤으로 도구를 써본 짧은 후기를 120~180자 한 문단으로 작성한다. 장점 하나와 아쉬운 점 하나를 모두 포함한다."}
4
+ {"prompt_id":"rb25-ko-chat-update-001","language":"ko","register":"chat-update","purpose":"model generation anchor","prompt":"팀 채팅 업데이트처럼 오늘 진행한 일, 막힌 점, 다음 액션을 100~160자 한 문단으로 정리한다."}
5
+ {"prompt_id":"rb25-ko-technical-howto-001","language":"ko","register":"technical-how-to","purpose":"model generation anchor","prompt":"터미널 명령을 실행하기 전 확인해야 할 조건과 실패 시 되돌리는 방법을 120~180자 한 문단으로 설명한다."}
6
+ {"prompt_id":"rb25-ko-edit-light-001","language":"ko","register":"product-doc","purpose":"lightly-edited-ai","prompt":"AI가 작성한 제품 문서 문단을 사람이 맞춤법과 어색한 조사만 살짝 고친 버전으로 만든다. 의미와 문장 구조는 대부분 유지한다."}
7
+ {"prompt_id":"rb25-ko-edit-heavy-001","language":"ko","register":"blog","purpose":"heavily-edited-ai","prompt":"AI가 작성한 블로그 문단을 사람이 구조와 어휘를 크게 바꾼 버전으로 만든다. 핵심 주장과 사실은 유지한다."}
@@ -0,0 +1,39 @@
1
+ {"source_id":"korea-policy-sme-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148938345","register":"chat-update","source_title":"소상공인 전환보증 3조 2000억 공급…전국 30곳 채무조정센터 설치","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-01-08","max_rows":10}
2
+ {"source_id":"korea-policy-export-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148948589","register":"chat-update","source_title":"미 상호관세 충격 수출 중소기업 지원 3대 지원책 가동","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-09-03","max_rows":10}
3
+ {"source_id":"korea-policy-budget-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148942874","register":"chat-update","source_title":"2025년 제1차 추가경정예산 4.8조 원 확정","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-05-02","max_rows":10}
4
+ {"source_id":"korea-policy-education-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148939202","register":"chat-update","source_title":"주거안정장학금 첫 시행…대학생에 월 최대 20만 원 지원","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-02-04","max_rows":10}
5
+ {"source_id":"korea-policy-car-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148941486","register":"chat-update","source_title":"자동차·부품 업계에 정책금융 2조 원 추가 투입","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-04-09","max_rows":10}
6
+ {"source_id":"toss-navigation-score-2025","url":"https://toss.tech/article/Toss_Navigation_Score","register":"blog","source_title":"토스가 특허 낸 리서치툴, TNS 제작기","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","source_published_at":"2025-07-04","max_rows":10}
7
+ {"source_id":"toss-bank-interns-2025","url":"https://toss.tech/article/toss-bank-interns","register":"blog","source_title":"슬기로운 토스뱅크 개발 인턴 생활","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","source_published_at":"2025-07-14","max_rows":10}
8
+ {"source_id":"toss-frontend-github-2025","url":"https://toss.tech/article/34895","register":"blog","source_title":"토스 프론트엔드에 이력서 없이 리포지토리 링크로 지원하세요","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","source_published_at":"2025-04-08","max_rows":10}
9
+ {"source_id":"toss-people-design-2025","url":"https://toss.tech/article/Tosspeople_LeeJiyoon","register":"blog","source_title":"토스 피플: 이것도 나니까 할 수 있다고 생각하기","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","source_published_at":"2025-03-27","max_rows":10}
10
+ {"source_id":"toss-ml-platform-2025","url":"https://toss.tech/article/feature-store-trainkit","register":"technical-how-to","source_title":"토스가 다양한 ML 모델을 만드는 법: Feature Store & Trainkit","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","source_published_at":"2025-08-14","max_rows":12}
11
+ {"source_id":"toss-gpu-mig-2025","url":"https://toss.tech/article/toss-securities-gpu-mig","register":"technical-how-to","source_title":"GPU를 밀도 있게 쓰는 방법 - 토스증권의 GPU 가상화(MIG) 도입기","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","source_published_at":"2025-07-10","max_rows":12}
12
+ {"source_id":"kistep-rnd-agenda-2025","url":"https://www.kistep.re.kr/reportDetail.es?mid=a10305050000&rpt_no=RES0220260011&rpt_tp=831-005","register":"academic-summary","source_title":"2025년 국가연구개발 투자 핵심이슈 발굴 및 아젠다 수립 연구","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-02-11","max_rows":10}
13
+ {"source_id":"kei-sdgs-2025","url":"https://repository.kei.re.kr/handle/2017.oak/24270","register":"academic-summary","source_title":"지속가능발전목표(SDGs)의 상호작용에 관한 연구","source_license":"Public repository abstract page; raw text remains hash-only pending redistribution review.","source_published_at":"2025-03-31","max_rows":8}
14
+ {"source_id":"nrf-trend-ai-2025","url":"https://webzine.nrf.re.kr/magazine/2503/sub3.php","register":"academic-summary","source_title":"한국연구재단웹진 2025년 3월 트렌드 리포트","source_license":"Public webzine page; raw text remains hash-only pending redistribution review.","source_published_at":"2025-03-01","max_rows":8}
15
+ {"source_id":"kakao-login-doc","url":"https://developers.kakao.com/docs/latest/ko/kakaologin/common","register":"product-doc","source_title":"카카오 로그인 공통 가이드","source_license":"Public developer documentation; no redistribution review complete; raw text remains hash-only.","max_rows":12}
16
+ {"source_id":"naver-login-doc","url":"https://developers.naver.com/docs/login/overview/overview.md","register":"product-doc","source_title":"네이버 로그인 API 명세","source_license":"Public developer documentation; no redistribution review complete; raw text remains hash-only.","max_rows":12}
17
+ {"source_id":"seoul-opengov-copyright","url":"https://opengov.seoul.go.kr/copyright","register":"product-doc","source_title":"서울 열린데이터광장 저작권 정책","source_license":"Public-sector copyright policy page; raw text remains hash-only until row review.","max_rows":10}
18
+ {"source_id":"korea-policy-lowbirth-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148945476","register":"chat-update","source_title":"올해 저출산·고령화 대응에 100조 7000억 원 투입…성과기반 운영","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-07-04","max_rows":10}
19
+ {"source_id":"korea-policy-youth-work-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148947124","register":"chat-update","source_title":"청년 근속 인센티브 조기 지급…7월부터 청년 3282명에","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-07-01","max_rows":6}
20
+ {"source_id":"toss-front-2","url":"https://toss.tech/article/toss_front","register":"blog","source_title":"하마터면 못생겨질 뻔했다 - 토스 프론트 2 제작기","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","max_rows":10}
21
+ {"source_id":"toss-tpm-ai-org","url":"https://toss.tech/article/toss-tpm","register":"blog","source_title":"AI 시대, 성과 내는 조직일수록 토스식 TPM이 필요한 이유","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","max_rows":10}
22
+ {"source_id":"toss-da-assistant-panda","url":"https://toss.tech/article/da-assistant-panda","register":"technical-how-to","source_title":"토스플레이스 데이터봇 판다(PANDA)를 소개합니다","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","max_rows":12}
23
+ {"source_id":"toss-starrocks-resource-group","url":"https://toss.tech/article/operating-starrocks-1","register":"technical-how-to","source_title":"StarRocks 운영기: Resource Group으로 멀티테넌트 워크로드 격리하기","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","max_rows":12}
24
+ {"source_id":"toss-post-quantum-crypto","url":"https://toss.tech/article/post-quantum-cryptography","register":"technical-how-to","source_title":"양자컴퓨터 시대에 대비한 양자내성암호 적용","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","max_rows":12}
25
+ {"source_id":"toss-harness-productivity","url":"https://toss.tech/article/harness-for-team-productivity","register":"technical-how-to","source_title":"Software 3.0 시대, Harness를 통한 조직 생산성 저점 높이기","source_license":"Public web page; no redistribution review complete; raw text remains hash-only.","max_rows":12}
26
+ {"source_id":"kakao-login-faq","url":"https://developers.kakao.com/docs/latest/ko/kakaologin/faq","register":"product-doc","source_title":"카카오 로그인 FAQ","source_license":"Public developer documentation; no redistribution review complete; raw text remains hash-only.","max_rows":12}
27
+ {"source_id":"kakao-login-flutter","url":"https://developers.kakao.com/docs/latest/ko/kakaologin/flutter","register":"product-doc","source_title":"카카오 로그인 Flutter 가이드","source_license":"Public developer documentation; no redistribution review complete; raw text remains hash-only.","max_rows":12}
28
+ {"source_id":"naver-login-api","url":"https://developers.naver.com/docs/login/api/api.md","register":"product-doc","source_title":"네이버 로그인 API 명세","source_license":"Public developer documentation; no redistribution review complete; raw text remains hash-only.","max_rows":10}
29
+ {"source_id":"kakao-android-start","url":"https://developers.kakao.com/docs/latest/ko/android/getting-started","register":"product-doc","source_title":"Kakao SDK for Android 시작하기","source_license":"Public developer documentation; no redistribution review complete; raw text remains hash-only.","max_rows":8}
30
+ {"source_id":"kakao-dev-quota","url":"https://developers.kakao.com/docs/latest/ko/getting-started/quota","register":"product-doc","source_title":"Kakao Developers 쿼터","source_license":"Public developer documentation; no redistribution review complete; raw text remains hash-only.","max_rows":4}
31
+ {"source_id":"kistep-performance-eval-2025","url":"https://www.kistep.re.kr/reportAllDetail.es?mid=a10305010000&rpt_no=RES0220260060&rpt_tp=831-001","register":"academic-summary","source_title":"2025년 성과평가 정책수립·운영 및 평가실시","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-02-25","max_rows":24}
32
+ {"source_id":"kistep-rnd-consistency-2025","url":"https://www.kistep.re.kr/reportAllDetail.es?mid=a10305010000&rpt_no=RES0220260037&rpt_tp=831-003","register":"academic-summary","source_title":"2025년 R&D사업 예비타당성조사 일관성 제고를 위한 조사체계 개선방향 연구","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-02-08","max_rows":8}
33
+ {"source_id":"kistep-outcome-policy-2025","url":"https://www.kistep.re.kr/reportDetail.es?mid=a10305050000&rpt_no=RES0220260095&rpt_tp=831-004","register":"academic-summary","source_title":"2025년 연구성과 확산 정책 수립 및 제도 운영","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-03-30","max_rows":8}
34
+ {"source_id":"kistep-rnd-investment-issue-2025","url":"https://www.kistep.re.kr/reportDetail.es?mid=a10305050000&rpt_no=RES0220260070&rpt_tp=831-004","register":"academic-summary","source_title":"2025년 R&D 투자의 성과 활성화 정책이슈 분석 연구","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-02-25","max_rows":8}
35
+ {"source_id":"kistep-scoreboard-2025","url":"https://www.kistep.re.kr/reportAllDetail.es?mid=a10305010000&rpt_no=RES0220260028&rpt_tp=831-002","register":"academic-summary","source_title":"2025년 과학기술혁신정책 스코어보드 개발 연구","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-02-25","max_rows":8}
36
+ {"source_id":"korea-policy-youth-basic-2025","url":"https://www.korea.kr/news/policyNewsView.do?newsId=148949763","register":"chat-update","source_title":"정부, 보편적 청년정책 추진…안정적 기본생활 보장","source_license":"Public web page; news images/third-party media excluded; raw text remains hash-only pending page-level redistribution review.","source_published_at":"2025-09-22","max_rows":6}
37
+ {"source_id":"kistep-student-researcher-2025","url":"https://www.kistep.re.kr/reportDetail.es?mid=a10305090000&rpt_no=RES0220260096&rpt_tp=831-002","register":"academic-summary","source_title":"2025년 학생인건비 및 학생연구자 지원 제도 운영 연구","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-03-31","max_rows":4}
38
+ {"source_id":"kistep-research-ethics-2025","url":"https://www.kistep.re.kr/reportAllDetail.es?mid=a10305010000&rpt_no=RES0220260054&rpt_tp=831-004","register":"academic-summary","source_title":"2025년 연구윤리 확보 및 실무지원 연구","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-02-20","max_rows":4}
39
+ {"source_id":"kistep-preliminary-feasibility-2025","url":"https://www.kistep.re.kr/reportAllDetail.es?mid=a10305010000&rpt_no=RES0220260093&rpt_tp=831-003","register":"academic-summary","source_title":"2025년 국가연구개발사업 예비타당성조사 제도 운영 연구","source_license":"Public research summary page; raw text remains hash-only pending redistribution review.","source_published_at":"2026-03-31","max_rows":4}
@@ -0,0 +1,18 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" width="164" height="28" viewBox="0 0 164 28" role="img" aria-labelledby="title desc">
2
+ <title id="title">patina score badge</title>
3
+ <desc id="desc">Static patina README badge for repositories that want the mark without publishing a live score endpoint.</desc>
4
+ <defs>
5
+ <linearGradient id="score" x1="0" y1="0" x2="1" y2="1">
6
+ <stop offset="0%" stop-color="#2dd4bf"/>
7
+ <stop offset="100%" stop-color="#34d399"/>
8
+ </linearGradient>
9
+ </defs>
10
+ <rect width="164" height="28" rx="14" fill="#020617"/>
11
+ <rect x="1" y="1" width="162" height="26" rx="13" fill="none" stroke="#1e293b"/>
12
+ <path d="M13 11c3.2-5.4 11-6 15.8-1.8l-3.5 3.5c-2.8-2.2-6.2-1.8-8.2 1z" fill="#c46a2a"/>
13
+ <path d="M28.8 17c-3.2 5.4-11 6-15.8 1.8l3.3-3.7c3 2.5 6.4 2.1 8.5-.8z" fill="#2dd4bf"/>
14
+ <circle cx="20.8" cy="14" r="2.7" fill="#ffe6a8"/>
15
+ <text x="38" y="18" fill="#f8fafc" font-family="Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, Segoe UI, sans-serif" font-size="12" font-weight="800" letter-spacing="-.2">patina</text>
16
+ <rect x="83" y="4" width="77" height="20" rx="10" fill="url(#score)"/>
17
+ <text x="93" y="18" fill="#042f2e" font-family="Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, Segoe UI, sans-serif" font-size="11" font-weight="800">human-ish</text>
18
+ </svg>
@@ -0,0 +1,8 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" role="img" aria-labelledby="title desc">
2
+ <title id="title">patina mark</title>
3
+ <desc id="desc">A transparent copper-to-teal mark around a warm preserved-meaning core.</desc>
4
+
5
+ <path d="M92 196C160 86 320 74 420 164L346 238C288 190 218 198 174 258Z" fill="#c46a2a"/>
6
+ <path d="M420 316C352 426 192 438 92 348L160 270C224 322 294 314 338 254Z" fill="#2dd4bf"/>
7
+ <circle cx="252" cy="248" r="54" fill="#ffe6a8" stroke="#020617" stroke-width="10"/>
8
+ </svg>
@@ -0,0 +1,79 @@
1
+ # Demo assets
2
+
3
+ README hero animations are language-suffixed so each localized README can point
4
+ at a matching demo instead of reusing a Korean recording everywhere.
5
+
6
+ Current assets:
7
+
8
+ - `patina-demo-en.gif` — English README hero. `README_ZH.md` and `README_JA.md`
9
+ intentionally fall back to this English terminal demo until localized ZH/JA
10
+ recordings are worth maintaining.
11
+ - source: `examples/short/marketing-launch-en.md`
12
+ - expected rewrite: `examples/short/marketing-launch-en-rewritten.md`
13
+ - `patina-demo-ko.gif` — Korean README hero.
14
+ - source: `examples/short/marketing-launch.md`
15
+ - expected rewrite: `examples/short/marketing-launch-rewritten.md`
16
+ - Future localized demos should use the same naming pattern:
17
+ `patina-demo-zh.gif`, `patina-demo-ja.gif`.
18
+
19
+ Shared requirements:
20
+
21
+ - verification: each rewritten fixture should pass the 30% content gate before
22
+ the GIF is updated
23
+ - size target: GIF under 10 MB so GitHub renders it reliably
24
+
25
+ ## Deterministic fallback render
26
+
27
+ When a live terminal recorder is not available, render a small transcript GIF
28
+ from checked-in fixtures. This helper is optional and requires local Python +
29
+ Pillow only for asset regeneration; it is not a package runtime dependency.
30
+
31
+ ```bash
32
+ python3 scripts/render-demo-gif.py \
33
+ --lang en \
34
+ --source examples/short/marketing-launch-en.md \
35
+ --rewrite examples/short/marketing-launch-en-rewritten.md \
36
+ --output assets/demo/patina-demo-en.gif \
37
+ --title "patina demo — English" \
38
+ --score-line "PASS · score 0.0% · MPS: meaning preserved"
39
+
40
+ node scripts/precommit-score.mjs examples/short/marketing-launch-en-rewritten.md
41
+ ls -lh assets/demo/patina-demo-en.gif
42
+
43
+ # Korean variant, when it needs to be regenerated:
44
+ python3 scripts/render-demo-gif.py \
45
+ --lang ko \
46
+ --source examples/short/marketing-launch.md \
47
+ --rewrite examples/short/marketing-launch-rewritten.md \
48
+ --output assets/demo/patina-demo-ko.gif \
49
+ --title "patina demo — Korean" \
50
+ --score-line "PASS · score under 30% · MPS: meaning preserved"
51
+
52
+ node scripts/precommit-score.mjs examples/short/marketing-launch-rewritten.md
53
+ ```
54
+
55
+ ## Preferred live terminal capture
56
+
57
+ Use this when `asciinema` and `agg` are installed and you want to capture a real
58
+ terminal session instead of the deterministic fallback render:
59
+
60
+ ```bash
61
+ # 1) Record a real terminal session using the checked-in language fixture.
62
+ asciinema rec /tmp/patina-demo-en.cast
63
+
64
+ # In the recording, run:
65
+ cat examples/short/marketing-launch-en.md
66
+ patina --lang en --tone marketing examples/short/marketing-launch-en.md
67
+ node scripts/precommit-score.mjs examples/short/marketing-launch-en-rewritten.md
68
+
69
+ # 2) Render to a GitHub-safe GIF. Do not use animated SVG for README motion.
70
+ agg /tmp/patina-demo-en.cast assets/demo/patina-demo-en.gif \
71
+ --cols 82 --rows 24 \
72
+ --font-family "DejaVu Sans Mono"
73
+
74
+ # 3) Keep the asset small and verify the rewritten fixture still passes.
75
+ ls -lh assets/demo/patina-demo-en.gif
76
+ node scripts/precommit-score.mjs examples/short/marketing-launch-en-rewritten.md
77
+ ```
78
+
79
+ If `agg` output is too large, compress with `gifsicle -O3` or re-record a shorter cast.
package/core/scoring.md CHANGED
@@ -76,52 +76,52 @@ Unknown categories (from custom packs not in the weight config) get default weig
76
76
  | Category | Weight | Patterns | Notes |
77
77
  |----------|--------|----------|-------|
78
78
  | content | 0.18 | 6 | |
79
- | language | 0.18 | 7 | |
79
+ | language | 0.18 | 8 | |
80
80
  | style | 0.18 | 6 | |
81
81
  | communication | 0.13 | 4 | |
82
82
  | filler | 0.08 | 4 | |
83
83
  | structure | 0.15 | 5 | |
84
- | viral-hook | 0.10 | 8 | score-only (no rewrite) |
85
- | **Total** | **1.00** | **40** | |
84
+ | viral-hook | 0.10 | 9 | score-only (no rewrite) |
85
+ | **Total** | **1.00** | **42** | |
86
86
 
87
87
  ### English (en)
88
88
 
89
89
  | Category | Weight | Patterns | Notes |
90
90
  |----------|--------|----------|-------|
91
91
  | content | 0.20 | 6 | |
92
- | language | 0.20 | 7 | |
92
+ | language | 0.20 | 8 | |
93
93
  | style | 0.20 | 6 | |
94
94
  | communication | 0.12 | 4 | |
95
95
  | filler | 0.08 | 4 | |
96
96
  | structure | 0.10 | 5 | |
97
- | viral-hook | 0.10 | 8 | score-only (no rewrite) |
98
- | **Total** | **1.00** | **40** | |
97
+ | viral-hook | 0.10 | 9 | score-only (no rewrite) |
98
+ | **Total** | **1.00** | **42** | |
99
99
 
100
100
  ### Chinese (zh)
101
101
 
102
102
  | Category | Weight | Patterns | Notes |
103
103
  |----------|--------|----------|-------|
104
104
  | content | 0.18 | 6 | |
105
- | language | 0.18 | 7 | |
105
+ | language | 0.18 | 8 | |
106
106
  | style | 0.18 | 6 | |
107
107
  | communication | 0.13 | 4 | |
108
108
  | filler | 0.08 | 4 | |
109
109
  | structure | 0.15 | 5 | |
110
- | viral-hook | 0.10 | 8 | score-only (no rewrite) |
111
- | **Total** | **1.00** | **40** | |
110
+ | viral-hook | 0.10 | 9 | score-only (no rewrite) |
111
+ | **Total** | **1.00** | **42** | |
112
112
 
113
113
  ### Japanese (ja)
114
114
 
115
115
  | Category | Weight | Patterns | Notes |
116
116
  |----------|--------|----------|-------|
117
117
  | content | 0.18 | 6 | |
118
- | language | 0.18 | 7 | |
118
+ | language | 0.18 | 8 | |
119
119
  | style | 0.18 | 6 | |
120
120
  | communication | 0.13 | 4 | |
121
121
  | filler | 0.08 | 4 | |
122
122
  | structure | 0.15 | 5 | |
123
- | viral-hook | 0.10 | 8 | score-only (no rewrite) |
124
- | **Total** | **1.00** | **40** | |
123
+ | viral-hook | 0.10 | 9 | score-only (no rewrite) |
124
+ | **Total** | **1.00** | **42** | |
125
125
 
126
126
  Weights are configurable via `ouroboros.category-weights.{lang}` in `.patina.yaml`.
127
127
 
@@ -123,7 +123,7 @@ FOR each anchor IN anchor_list:
123
123
  Apply all remaining pattern packs (content, language, style, communication, filler).
124
124
 
125
125
  1. **AI pattern identification** — scan all loaded sentence/lexical patterns
126
- 2. **Problem segment rewrite** — replace AI-sounding expressions with natural alternatives
126
+ 2. **Problem segment rewrite** — do not swap tokens in place; read the local context and rewrite the affected clause/sentence into a natural alternative
127
127
  3. **Meaning preservation** — keep core message intact
128
128
  4. **Tone matching** — adjust tone per the profile's guidance
129
129
  5. **Voice injection** — add personality per `core/voice.md`
@@ -134,6 +134,8 @@ Apply all remaining pattern packs (content, language, style, communication, fill
134
134
  - MEDIUM semantic risk: inject only Polarity/Negation anchors
135
135
  - LOW semantic risk: no constraints
136
136
 
137
+ **CJK clause-level rewrite guard (issue #352):** For `ko`, `zh`, and `ja`, do not fix AI tells by replacing one punctuation mark or one token at a time. If connective punctuation (em dash, colon, semicolon, slash, comma splice, parenthetical aside) appears with a suspect phrase, read the whole sentence and choose an idiomatic clause structure, sentence split, or connective phrase in the target language. If a translationese/calque phrase is attached to punctuation, fix both together at clause level. Korean examples: prefer `TUI 없이 완전 자율로 설치하려면 ...` over `무 TUI ...`, and `"끝난 것 같아요"만으로는 부족한, 결과를 끝까지 확인해야 하는 열린 작업` over `"끝난 것 같아요"로는 부족한 열린 작업`. Preserve actors, polarity, conditions, numbers, and causation.
138
+
137
139
  **Caution**: Do NOT re-tidy sections already corrected in Phase 1 back into "polished officialese".
138
140
 
139
141
  #### 5b-v: Anchor Verification
@@ -24,6 +24,8 @@ description: Deterministic statistical preprocessing that flags suspect paragrap
24
24
  hot 단락/문장 정보를 5a·5b 입력에 주입한다
25
25
  - 외부 의존성(형태소 분석기, 외부 detector API) 없이 ko/en은 whitespace 토큰화,
26
26
  zh/ja는 deterministic character-token fallback으로 동작한다
27
+ - ko는 dependency-free 보조 지표(띄어쓰기 규칙성, 쉼표 밀도, 조사/어미 suffix diversity
28
+ proxy)를 함께 기록하고, 세 지표가 모두 보수적 임계값을 넘을 때만 ko 전용 hot 신호로 사용한다
27
29
 
28
30
  이 알고리즘은 5a 구조 분석 단계와 5b 문장/어휘 단계에 자연스럽게 결합되도록 설계되었다.
29
31
  - 5a는 단락 수준 hot 표시를 우선 검토 대상으로 사용한다
@@ -33,7 +35,7 @@ description: Deterministic statistical preprocessing that flags suspect paragrap
33
35
 
34
36
  | 언어 | 지원 | 비고 |
35
37
  |------|---------|------|
36
- | ko | yes | 어절 단위 토큰화 |
38
+ | ko | yes | 어절 단위 토큰화 + dependency-free diagnostic signals |
37
39
  | en | yes | 단어 단위 토큰화 |
38
40
  | zh | yes | Han character-token fallback |
39
41
  | ja | yes | Kana/Han character-token fallback; 형태소 분석 없음 |
@@ -81,6 +83,10 @@ ELSE:
81
83
  > sudachi, mecab 같은 형태소 분석기는 배포 의존성과 설치 실패면을 늘리므로 기본 경로에서
82
84
  > 제외한다.
83
85
 
86
+ ko diagnostic POS diversity는 형태소 분석기가 아니라 suffix proxy다. 조사/어미 후보 목록으로
87
+ 어절 끝을 보수적으로 분류하고, class diversity와 coverage를 기록한다. 이 proxy는 진짜 POS
88
+ tagger가 아니므로 단독 hot 신호가 아니라 spacing/comma와 결합된 ko composite에서만 사용한다.
89
+
84
90
  ---
85
91
 
86
92
  ## 3. Sentence Splitting
@@ -126,6 +132,8 @@ burstiness_CV = stddev / mean
126
132
  - `CV` = Coefficient of Variation (변동계수)
127
133
  - mean 이 0 이면 정의되지 않음 → skip 처리
128
134
  - population stddev 사용 (표본 보정 없음, 단순화)
135
+ - 단락 문장이 3개 미만이면 CV 값은 기록하되 burstiness band는 부여하지 않는다.
136
+ 2문장 이하 샘플은 길이 변동성이 너무 불안정해서 hot 판정에서 제외한다.
129
137
 
130
138
  ### 밴드
131
139
 
@@ -181,26 +189,68 @@ ELSE:
181
189
 
182
190
  ---
183
191
 
192
+ ## 5.1 Korean Diagnostic Signals
193
+
194
+ 한국어는 어절 토큰화만으로는 조사/어미 변형과 띄어쓰기 습관을 충분히 보지 못한다. 그래서
195
+ `analyzeText(text, { lang: "ko" })`는 다음 보조 지표를 단락 결과에 추가한다.
196
+
197
+ | 필드 | 지표 | 목적 | 현재 판정 영향 |
198
+ |------|------|------|----------------|
199
+ | `spacing` | `eojeolLengthCV`, 평균 어절 길이, 1음절/긴 어절 비율 | LLM식으로 지나치게 균질한 띄어쓰기·어절 길이 후보 추적 | composite 조건 |
200
+ | `comma` | 쉼표 수, 문장당 쉼표, 100자당 쉼표 | 쉼표 리듬이 사라진 장문 후보 추적 | composite 조건 |
201
+ | `posDiversity` | 조사/어미 suffix class coverage/diversity proxy | 형태소 분석기 없이 기능어 다양성 후보 추적 | composite 조건 |
202
+ | `koDiagnostics` | 세 지표의 보수적 AND 판정 | burstiness/MATTR/lexicon이 놓치는 ko hotspot 보강 | hot OR-rule 참여 |
203
+
204
+ POS proxy 결정:
205
+ - 새 런타임 의존성은 추가하지 않는다.
206
+ - KoNLPy/mecab-ko/khaiii 같은 형태소 분석기는 설치 실패면과 배포 크기가 커서 기본 경로에서
207
+ 제외한다.
208
+ - suffix proxy는 `은/는`, `이/가`, `을/를`, `에게/에서`, `습니다/합니다` 같은 조사·어미
209
+ class를 세고, `proxy: "suffix"`로 명시한다.
210
+
211
+ 운영 규칙:
212
+ - 세 조건을 모두 만족할 때만 `koDiagnostics.hot=true`가 된다.
213
+ - 기본값: `minSentences=4`, `minEojeols=20`, `spacing.eojeolLengthCV <= 0.38`,
214
+ `comma.perSentence < 1`, `posProxy.matchedCount >= 10`,
215
+ `posProxy.classDiversity <= 0.26`.
216
+ - 쉼표가 적다는 사실만으로 hot 처리하지 않는다. spacing과 suffix diversity가 동시에
217
+ 맞아야 한다.
218
+ - `.patina.default.yaml`의 `stylometry.ko_diagnostics.enabled=false`로 이 composite만 끌 수
219
+ 있다. 기존 burstiness/MATTR/lexicon 신호는 그대로 동작한다.
220
+
221
+ Calibration note (2026-05-22): `npm run benchmark:katfish-ko -- --write --basename katfish-ko-latest` evaluates this zero-dependency composite against a private KatFish checkout plus the 250-row public-web KO human-control set. Current aggregate result: KatFish catch rate improves from 58.9% without KO diagnostics to 74.8% with KO diagnostics (+15.9 pp), while the public-web human-control FP rate stays 42/250 (16.8%, +0 rows). The KatFish raw rows are not committed because the upstream repository does not expose repo-level license metadata; only aggregate reports are tracked.
222
+
223
+ ---
224
+
184
225
  ## 6. Hot Decision Rule
185
226
 
186
- 단락 수준 SUSPECT 판정은 단순한 OR 규칙으로 결정한다.
227
+ 단락 수준 SUSPECT 판정은 단순한 OR 규칙으로 결정한다. v3.7 이후 lexicon density가
228
+ 세 번째 hot 신호로 추가되었다(상세 calibration은 §16).
187
229
 
188
230
  ```
189
231
  paragraph is SUSPECT iff
190
- burstiness_band == "low" OR MATTR_band == "low"
232
+ burstiness_band == "low" OR MATTR_band == "low" OR
233
+ (lexicon_density > threshold AND lexicon_min_hits is satisfied) OR
234
+ koDiagnostics.hot == true
191
235
  ```
192
236
 
193
237
  ### 근거
194
238
 
195
- - 신호 중 하나만 약해도 사람 글에서는 드물게 동시 발생한다
239
+ - 신호 중 하나만 약해도 우선 검토할 만한 edit hotspot이 된다
196
240
  - AND 조건은 recall 이 너무 낮아 v1 acceptance criteria(AI 시드 7/10)를 만족하기 어렵다
197
241
  - false positive 는 `FalsePositiveControl` 평가(자연 시드 ≤2/10)로 견제한다
242
+ - ko/zh/ja lexicon은 단일 hit를 hot 판정으로 쓰지 않고 audit hint로만 남긴다. 짧은
243
+ 한국어 전문/정책 문단에서 `기반`, `흐름` 같은 보통 명사가 단독으로 터지는 오탐을
244
+ 막기 위해 CJK 기본 `lexicon_min_hits`는 2다.
245
+ - ko diagnostic signal은 내부에서 AND composite를 사용하므로 쉼표/조사 같은 단일 특징만으로는
246
+ OR-rule에 들어오지 않는다
198
247
 
199
248
  ### 임계값 재조정
200
249
 
201
250
  v1 결과에서 false positive 가 한도(자연 시드 2/10)를 넘기면, `.patina.default.yaml`의
202
251
  `stylometry.burstiness.bands.low` 또는 `stylometry.ttr.bands.low` 를 보수적으로 낮춰
203
- 재평가한다 (예: 0.30 → 0.27).
252
+ 재평가한다 (예: 0.30 → 0.27). lexicon 쪽은 `lexicon.density_threshold`로 조정한다.
253
+ ko composite는 `stylometry.ko_diagnostics.bands`에서 조정한다.
204
254
 
205
255
  ---
206
256
 
@@ -244,7 +294,7 @@ emit groups of length ≥ 2 as sub-flags
244
294
  |------|------|
245
295
  | 단락 수 ≤ 2 | 4.6단계 전체 skip — meta block 생략 |
246
296
  | 전체 문장 수 ≤ 2 | 4.6단계 전체 skip — meta block 생략 |
247
- | 단락 내 문장 수 < 2 | 해당 단락은 burstiness 계산 불가 → MATTR 만 평가 |
297
+ | 단락 내 문장 수 < 3 | CV 값은 기록하지만 burstiness band/hot 판정은 생략 → MATTR/lexicon 만 평가 |
248
298
  | 단락 내 토큰 수 = 0 | 해당 단락 skip |
249
299
  | 언어가 `stylometry.languages` 에 없음 | 4.6단계 전체 skip (기본값은 ko/en/zh/ja 포함) |
250
300
 
@@ -441,8 +491,9 @@ optional dependency로 설치 실패가 없는 경로, (2) ko/en/zh/ja benchmark
441
491
  | ~~n-gram redundancy~~ | ~~bi/trigram 반복도~~ — **dropped, §15 negative finding** | — |
442
492
  | ~~AI-lexicon overlap~~ | ~~28-패턴 외 AI 특유 어구 사전 매칭~~ — **shipped v3.7, §16** | v3.7 |
443
493
  | ~~zh/ja character fallback~~ | ~~Han/Kana character-token burstiness/TTR~~ — **shipped for 4.6** | current |
494
+ | ~~ko diagnostic composite~~ | ~~spacing/comma/suffix proxy~~ — **shipped as conservative AND signal, §5.1** | current |
444
495
  | Perplexity proxy | small LM 또는 cloze prompt 기반 token-level surprise | v4 후보 |
445
- | Function-word distribution | 기능어 빈도 분포 차이 (Mosteller & Wallace 류) | v4 후보 |
496
+ | Broader function-word distribution | en/zh/ja 포함 기능어 빈도 분포 차이 | v4 후보 |
446
497
  | GPTZero / Originality 연동 | 외부 detector API 결과를 hot 신호로 합산 | v4+ 후보 |
447
498
  | 형태소 분석 기반 ko 토큰화 | 어절 → 형태소 단위로 정밀화 | v2+ |
448
499
  | zh/ja 형태소 분석 통합 | 단어 경계 처리; 위 go/no-go 충족 시에만 재검토 | v4+ 후보 |
@@ -456,6 +507,9 @@ optional dependency로 설치 실패가 없는 경로, (2) ko/en/zh/ja benchmark
456
507
  - **Korean morphology coarseness**: 어절 단위 토큰화는 morpheme-level 분석과 다르다.
457
508
  같은 명사의 조사 변형(`도구는`, `도구가`, `도구를`)을 서로 다른 token 으로 취급하므로
458
509
  MATTR 이 실제보다 약간 높게 나올 수 있다.
510
+ - **Korean diagnostic composite is conservative**: 쉼표 없음 + 균질한 어절 길이 + 낮은 suffix
511
+ class diversity가 모두 맞아야 하므로 recall보다는 false-positive 억제를 우선한다. 실제
512
+ KatFish/2025+ corpus 재보정 전에는 넓은 성능 주장으로 쓰지 않는다.
459
513
  - **Short text noise**: 단락 ≤2 또는 문장 ≤2 인 텍스트는 통계적으로 신뢰할 수 없어 skip
460
514
  한다. 이 경우 4.6단계 전체가 비활성화된다.
461
515
  - **Window=50 fallback**: MATTR window 보다 짧은 단락은 simple TTR 로 fallback 하므로
@@ -467,8 +521,8 @@ optional dependency로 설치 실패가 없는 경로, (2) ko/en/zh/ja benchmark
467
521
  - **False split**: 약어(`Mr.`, `e.g.`) 와 소수점(`3.14`) 같은 종결부호 오인식을 v1 에서
468
522
  감수한다. 시드 평가에서 false positive 비율이 한도를 넘기면 v1.1 에서 보강한다.
469
523
  - **Language scope**: 4.6 stylometry는 ko/en/zh/ja를 지원한다. zh/ja는 형태소가 아니라
470
- character-token fallback이므로 MATTR 해석은 보수적으로만 사용한다. 4.7 lexicon
471
- curated 사전이 있는 en/ko 기본 지원한다.
524
+ character-token fallback이므로 MATTR 해석은 보수적으로만 사용한다. 4.7 lexicon
525
+ en/ko/zh/ja 기본 사전을 제공한다.
472
526
 
473
527
  ---
474
528
 
@@ -612,33 +666,37 @@ For paragraph P with tokens T:
612
666
  substring for "Multi-word phrases")
613
667
  density = matches / len(T) * 1000 # matches per 1000 tokens
614
668
 
615
- hot iff density > threshold
669
+ hot iff density > threshold AND min_hot_matches is satisfied
616
670
  ```
617
671
 
618
672
  기본 threshold = `2.0` (1,000 토큰당 2회). `lexicon.density_threshold`로 설정 가능.
673
+ 기본 `min_hot_matches`는 영어 1, 한국어/중국어/일본어 2다. CJK 단일 lexicon hit는
674
+ audit hint로 표시하지만 단락을 hot으로 만들지는 않는다.
619
675
 
620
- ### 단락 hot 결정 규칙 (3-signal OR)
676
+ ### 단락 hot 결정 규칙 (v3.7 당시 3-signal OR)
621
677
 
622
678
  ```
623
679
  paragraph is SUSPECT iff
624
- burstiness_band == "low" OR MATTR_band == "low" OR lexicon_density > threshold
680
+ burstiness_band == "low" OR MATTR_band == "low" OR
681
+ (lexicon_density > threshold AND min_hot_matches is satisfied)
625
682
  ```
626
683
 
627
- §6의 2-signal OR 규칙을 3-signal OR로 확장한다. burstiness/MATTR는 분포적 신호, lexicon은 어휘적 신호 — 다른 축이라 OR 결합 시 둘 다 합산 효과를 낸다.
684
+ v3.7에서는 §6의 2-signal OR 규칙을 3-signal OR로 확장했다. burstiness/MATTR는 분포적 신호, lexicon은 어휘적 신호 — 다른 축이라 OR 결합 시 둘 다 합산 효과를 냈다. 현재 전체 규칙은 §6의 4-signal OR이며, ko diagnostic composite가 네 번째 축이다.
628
685
 
629
686
  ### 사전 파일
630
687
 
631
- - `lexicon/ai-en.md` — 영어 50개 strict + 58개 phrase = 108 entries
632
- - `lexicon/ai-ko.md` — 한국어 41개 strict + 49개 phrase = 90 entries
688
+ - `lexicon/ai-en.md` — 영어 43개 strict + 45개 phrase = 88 entries
689
+ - `lexicon/ai-ko.md` — 한국어 43개 strict + 53개 phrase = 96 entries
690
+ - `lexicon/ai-zh.md` — 중국어 60개 phrase = 60 entries
691
+ - `lexicon/ai-ja.md` — 일본어 60개 phrase = 60 entries
633
692
 
634
- 탐색은 `Glob lexicon/ai-{lang}.md`로 자동. zh/ja는 4.6 stylometry만 기본 지원하고
635
- 4.7 lexicon은 curated 사전이 없어 기본 미지원(`lexicon.languages` 기본 `[en, ko]`).
636
- 사용자가 `custom/lexicon/ai-{lang}.md` 를 두면 우선 로드.
693
+ 탐색은 `Glob lexicon/ai-{lang}.md`로 자동. 기본 `lexicon.languages`는
694
+ `[en, ko, zh, ja]`이다. 사용자가 `custom/lexicon/ai-{lang}.md` 두면 우선 로드.
637
695
 
638
696
  ### 매칭 정책
639
697
 
640
- - **Strict matches** (Markdown `## Strict matches` 섹션): 대소문자 무시 whole-word 매칭. 한국어 어절(예: `자리매김`) substring으로 근사 — Korean punctuation/space로 분리되므로 false attach 위험이 낮다
641
- - **Multi-word phrases** (Markdown `## Multi-word phrases` 섹션): 대소문자 무시 substring 매칭. `~` 가 포함된 한국어 phrase(예: `~의 지평을 넓히다`)는 `~` 을 wildcard 로 취급 (`.{0,40}`)
698
+ - **Strict matches** (Markdown `## Strict matches` 섹션): 대소문자 무시 whole-word 매칭. CJK(ko/zh/ja) substring으로 근사 — 한국어 어절과 zh/ja character fallback에서 multi-character entry가 통째 token으로 남지 않기 때문이다
699
+ - **Multi-word phrases** (Markdown `## Multi-word phrases` 섹션): 대소문자 무시 substring 매칭. `~` 가 포함된 phrase(예: `~의 지평을 넓히다`, `不仅~而且`, `~と言えるでしょう`)는 `~` 을 wildcard 로 취급 (`.{0,40}`)
642
700
  - 한 paragraph 안에서 같은 entry 가 여러 번 나와도 1로 계산 (entry 단위 카운트)
643
701
 
644
702
  ### LLM 전달 형식 확장
@@ -694,7 +752,7 @@ Pareto frontier (3-signal OR, threshold sweep):
694
752
 
695
753
  ### Threshold 선택 근거
696
754
 
697
- `density_threshold = 2.0` 채택. 0.5–5.0 plateau 구간 어디에서도 동일한 catch/FP 가 나오므로 사양 기본값(2.0) 을 사용한다. 운용 의미: "1,000 토큰당 AI lexicon entry 가 2개 초과로 나타나면 단락 의심". 이는 사양 §3 Recommendation 과 일치한다.
755
+ `density_threshold = 2.0` 채택. 0.5–5.0 plateau 구간 어디에서도 동일한 catch/FP 가 나오므로 사양 기본값(2.0) 을 사용한다. 운용 의미: "1,000 토큰당 AI lexicon entry 가 2개 초과로 나타나고, 언어별 최소 hit 수를 만족하면 단락 의심". 이는 사양 §3 Recommendation 과 일치한다. 2026-05 Korean 25-row register pilot 이후 CJK 단일 hit는 hot에서 audit hint로 낮췄다.
698
756
 
699
757
  ### Calibration drop list
700
758
 
@@ -714,7 +772,17 @@ lexicon 은 다음 카탈로그 항목과 **중복되지 않도록** 큐레이
714
772
  - `ko-language.md` Pattern 7, 8 (다양한, 활발한, 혁신적인, ~적 접미사 등) — 카탈로그가 이미 다룸
715
773
  - `ko-style.md` Pattern 13, 18 (이를 통해, 도모하다, 본 사업 등) — 카탈로그가 이미 다룸
716
774
 
717
- lexicon 의 50+58/41+49 entry 는 위 패턴들이 명시적으로 잡지 않는 영역(modal scaffolding, 추상명사, 의례적 도입/마무리 phrase) 만 추렸다.
775
+ lexicon 의 en 88 / ko 96 / zh 60 / ja 60 entry 는 위 패턴들이 명시적으로 잡지 않는 영역(modal scaffolding, 추상명사, 의례적 도입/마무리 phrase) 만 추렸다.
776
+
777
+ ### zh/ja lexicon 스타터 팩 (v3.11.x)
778
+
779
+ `lexicon/ai-zh.md`와 `lexicon/ai-ja.md`는 각각 60개 phrase-only entry로 시작한다.
780
+ 두 언어는 whitespace token이 안정적이지 않으므로 모든 기본 entry를 `Multi-word phrases`
781
+ 섹션에 두고 substring/wildcard 매칭으로 처리한다. 회귀 fixture에는 lexicon-only hot
782
+ 샘플과 natural cold counterexample을 각 언어별로 추가했으며, `npm run benchmark` 기준
783
+ zh/ja 모두 false positive 0/4, AI catch 4/4를 유지한다. 더 큰 외부 corpus로 확장할 때는
784
+ 각 파일의 counterexample table을 먼저 검토하고, 백과사전·뉴스 register에서 비슷하게
785
+ 발화하는 entry는 drop한다.
718
786
 
719
787
  ### 한국어 lexicon 의 검증 결과 + v3.8.0 재큐레이션
720
788
 
@@ -727,6 +795,9 @@ ko/AI 코퍼스 vs NamuWiki human 차별 빈도 마이닝(`.omc/research/v3_8_ko
727
795
  - Strict (8개): `평가된다`, `꼽힌다`, `가리킨다`, `사례로`, `다수의`, `알려져`, `일컬어진다`, `평가받다`
728
796
  - Phrases (4개): `가운데 하나로`, `자리 잡았다`, `알려져 있다`, `~의 사례로`
729
797
 
798
+ **v3.12.x false-positive pruning (96 entries)**:
799
+ `환경`, `기반`, `흐름`, `측면`, `토대`, `가리킨다`는 기술 문서의 구체적 용례(`환경 변수`, `이벤트 기반`, `인증 흐름` 등)에서 짧은 단락을 과열시키는 bare strict entry라 기본 lexicon에서 제외했다. 같은 register 과잉은 더 구체적인 phrase나 retained marker(`자리매김`, `중요한 의미`, `생태계`, `양상` 등)로 잡는다.
800
+
730
801
  **v3.8.0 결과** (동일 코퍼스 재측정, `.omc/research/v3_8_remeasure.py`):
731
802
 
732
803
  | Source | n | v3.7.0 hot | v3.8.0 hot | Δ | lex fires |