npm - patina-cli - Versions diffs - 3.11.0 → 4.0.0 - Mend

patina-cli 3.11.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (193) hide show

package/.patina.default.yaml +29 -29
package/CHANGELOG.md +53 -0
package/NOTICE +21 -0
package/README.md +117 -224
package/README_JA.md +134 -77
package/README_KR.md +132 -74
package/README_ZH.md +137 -80
package/SKILL.md +11 -20
package/artifacts/rebaseline-2025/README.md +147 -0
package/artifacts/rebaseline-2025/human-controls.public.jsonl +250 -0
package/artifacts/rebaseline-2025/intake.example.jsonl +2 -0
package/artifacts/rebaseline-2025/intake.local.example.jsonl +25 -0
package/artifacts/rebaseline-2025/prompts.template.jsonl +7 -0
package/artifacts/rebaseline-2025/sources.ko-public.jsonl +39 -0
package/assets/brand/patina-badge.svg +18 -0
package/assets/brand/patina-mark.svg +8 -0
package/assets/demo/README.md +79 -0
package/core/scoring.md +12 -12
package/core/standalone-prompt.md +3 -1
package/core/stylometry.md +93 -22
package/docs/API.md +1554 -0
package/docs/AUTHENTICATION.md +50 -26
package/docs/AUTHENTICATION_KR.md +54 -29
package/docs/BRANDING.md +9 -8
package/docs/CLI.md +55 -14
package/docs/COOKBOOK.md +8 -21
package/docs/DEMO.md +32 -5
package/docs/EXIT-CODES.md +2 -3
package/docs/FALSE-POSITIVES.md +63 -0
package/docs/FAQ.md +9 -1
package/docs/FAQ_KR.md +3 -1
package/docs/FLAG-PARITY.md +33 -47
package/docs/ISSUE-WAVES.md +57 -0
package/docs/PATTERNS-EN.md +67 -3
package/docs/PATTERNS-JA.md +68 -2
package/docs/PATTERNS-KO.md +70 -7
package/docs/PATTERNS-ZH.md +67 -3
package/docs/PATTERNS.md +5 -5
package/docs/RESEARCH-DOCS-PLATFORM.md +54 -0
package/docs/ROADMAP.md +46 -66
package/docs/TRANSLATIONESE-KO.md +51 -0
package/docs/audits/2026-05-deep-research.md +3 -1
package/docs/benchmarks/README.md +51 -0
package/docs/benchmarks/detector-comparison.json +69 -9
package/docs/benchmarks/detector-comparison.md +10 -5
package/docs/benchmarks/katfish-ko-latest.json +657 -0
package/docs/benchmarks/katfish-ko-latest.md +77 -0
package/docs/benchmarks/latest.json +1183 -108
package/docs/benchmarks/latest.md +84 -60
package/docs/benchmarks/lexicon-freshness-en-2026-05-22.json +1121 -0
package/docs/benchmarks/lexicon-freshness-en-2026-05-22.md +136 -0
package/docs/benchmarks/rebaseline-latest.json +381 -0
package/docs/benchmarks/rebaseline-latest.md +121 -0
package/docs/benchmarks/register-stratified-latest.json +164 -0
package/docs/benchmarks/register-stratified-latest.md +99 -0
package/docs/benchmarks/register-stratified.md +43 -0
package/docs/integrations/github-action.md +44 -11
package/docs/integrations/playground.md +58 -0
package/docs/integrations/pre-commit.md +5 -5
package/docs/integrations/release.md +5 -3
package/docs/integrations/static-sites.md +83 -0
package/docs/research/2025-rebaseline-plan.md +71 -2
package/docs/research/2026-rebaseline.md +102 -0
package/docs/research/adversarial-mps.md +41 -0
package/docs/research/ai-human-metrics.md +35 -23
package/docs/research/human-eval-panel.md +42 -0
package/docs/research/judge-agreement.md +24 -0
package/docs/research/ko-2025-corpus-sources.md +135 -0
package/docs/research/lexicon-freshness-audit.md +64 -0
package/docs/research/zh-ja-lexicon-calibration.md +60 -0
package/docs/social/patina-launch-copy.md +173 -100
package/docs/social/patina-launch-execution.md +94 -0
package/docs/social/patina-launch-korean-first.md +83 -0
package/docs/social/signs-of-ai-writing.md +26 -0
package/docs/social/signs-of-ai-writing_KR.md +26 -0
package/lexicon/ai-en.md +21 -24
package/lexicon/ai-ja.md +158 -0
package/lexicon/ai-ko.md +9 -9
package/lexicon/ai-zh.md +158 -0
package/lexicon/provenance/ai-en.json +970 -0
package/lexicon/provenance/ai-ja.json +542 -0
package/lexicon/provenance/ai-ko.json +866 -0
package/lexicon/provenance/ai-zh.json +542 -0
package/package.json +49 -8
package/patterns/en-communication.md +5 -0
package/patterns/en-content.md +5 -0
package/patterns/en-filler.md +5 -0
package/patterns/en-language.md +29 -1
package/patterns/en-structure.md +5 -0
package/patterns/en-style.md +5 -0
package/patterns/en-viral-hook.md +42 -2
package/patterns/ja-communication.md +5 -0
package/patterns/ja-content.md +5 -0
package/patterns/ja-filler.md +5 -0
package/patterns/ja-language.md +33 -1
package/patterns/ja-structure.md +12 -0
package/patterns/ja-style.md +5 -0
package/patterns/ja-viral-hook.md +41 -2
package/patterns/ko-communication.md +5 -0
package/patterns/ko-content.md +5 -0
package/patterns/ko-filler.md +5 -0
package/patterns/ko-language.md +33 -1
package/patterns/ko-structure.md +25 -6
package/patterns/ko-style.md +5 -0
package/patterns/ko-viral-hook.md +38 -2
package/patterns/zh-communication.md +5 -0
package/patterns/zh-content.md +5 -0
package/patterns/zh-filler.md +5 -0
package/patterns/zh-language.md +37 -1
package/patterns/zh-structure.md +12 -0
package/patterns/zh-style.md +5 -0
package/patterns/zh-viral-hook.md +38 -2
package/playground/README.md +55 -0
package/playground/analytics.js +4 -0
package/playground/analyzer.js +883 -0
package/playground/app.js +157 -0
package/playground/data/lexicons.js +343 -0
package/playground/index.html +138 -0
package/playground/styles.css +267 -0
package/profiles/namuwiki.md +111 -0
package/scripts/adversarial-mps-report.mjs +201 -0
package/scripts/badge-json.mjs +79 -0
package/scripts/benchmark-report.mjs +56 -9
package/scripts/check-release-metadata.mjs +0 -2
package/scripts/detector-comparison.mjs +7 -7
package/scripts/generate-playground-data.mjs +77 -0
package/scripts/katfish-calibration.mjs +464 -0
package/scripts/lexicon-freshness.mjs +485 -0
package/scripts/lint.mjs +1 -1
package/scripts/precommit-score.mjs +4 -3
package/scripts/prose-score.mjs +81 -5
package/scripts/rebaseline-intake.mjs +242 -0
package/scripts/rebaseline-score.mjs +268 -0
package/scripts/rebaseline-summary.mjs +773 -0
package/scripts/rebaseline-web-collect.mjs +410 -0
package/scripts/update-benchmark-ranges.mjs +1 -0
package/src/api.js +69 -105
package/src/auth.js +50 -2
package/src/backends/claude-cli.js +19 -4
package/src/backends/codex-cli.js +19 -3
package/src/backends/contract.js +230 -1
package/src/backends/gemini-cli.js +18 -5
package/src/backends/index.js +87 -12
package/src/backends/kimi-cli.js +161 -0
package/src/cli.js +577 -567
package/src/commands/doctor.js +2 -2
package/src/config.js +29 -0
package/src/errors.js +53 -1
package/src/features/discourse-tells.js +68 -0
package/src/features/index.js +82 -8
package/src/features/lexicon.js +40 -6
package/src/features/markup-leakage.js +69 -0
package/src/features/segment.js +41 -0
package/src/features/signal-strength.js +81 -0
package/src/features/stylometry.js +231 -1
package/src/features/translationese.js +127 -0
package/src/loader.js +76 -0
package/src/logger.js +22 -23
package/src/model-defaults.js +55 -0
package/src/ouroboros.js +31 -0
package/src/output.js +102 -90
package/src/prompt-builder.js +103 -68
package/src/providers.js +51 -4
package/src/scoring.js +210 -2
package/src/security.js +75 -0
package/tests/fixtures/live-quality/en/public-docs-01.md +26 -0
package/tests/fixtures/live-quality/ko/public-docs-01.md +26 -0
package/tests/fixtures/suspect-zones/expected-ranges.json +207 -16
package/tests/fixtures/suspect-zones/ja/ai/ja-ai-04-lexicon.md +11 -0
package/tests/fixtures/suspect-zones/ja/natural/ja-nat-04-lexicon-cold.md +11 -0
package/tests/fixtures/suspect-zones/ko/ai/ko-ai-02.md +4 -5
package/tests/fixtures/suspect-zones/ko/ai/ko-ai-07-ko-diagnostic.md +11 -0
package/tests/fixtures/suspect-zones/zh/ai/zh-ai-04-lexicon.md +11 -0
package/tests/fixtures/suspect-zones/zh/natural/zh-nat-04-lexicon-cold.md +11 -0
package/tests/quality/README.md +188 -11
package/tests/quality/adversarial-mps/fixtures.jsonl +10 -0
package/tests/quality/benchmark.mjs +39 -1
package/tests/quality/dogfood.mjs +5 -3
package/tests/quality/live-fixtures.jsonl +2 -0
package/tests/quality/live-quality.mjs +596 -0
package/tests/quality/ranking-metrics.mjs +136 -0
package/tests/quality/rebaseline-manifest.example.jsonl +5 -0
package/vercel.json +53 -0
package/SKILL-MAX.md +0 -455
package/docs/internal/HARNESS.md +0 -14
package/docs/internal/README.md +0 -14
package/docs/internal/WARP.md +0 -23
package/patina-max/SKILL.md +0 -523
package/patina-max/composite.py +0 -457
package/src/cache.js +0 -106
package/src/commands/init.js +0 -208
package/src/manifest.js +0 -162
package/src/max-mode.js +0 -207

package/tests/quality/README.md CHANGED Viewed

@@ -14,6 +14,7 @@ Outputs:
 - A markdown table per language (accuracy, precision, recall, F1, confusion matrix)
 - A list of any misclassified fixtures with their feature values
 - `tests/quality/results.json` — full per-fixture log (gitignored)
+- `docs/benchmarks/README.md` — report index, refresh commands, and public-claim rules
 - `docs/benchmarks/latest.md` / `latest.json` when run via `npm run benchmark:report`
 - `docs/benchmarks/detector-comparison.md` / `.json` when run via `npm run benchmark:compare`
@@ -23,34 +24,193 @@ Every fixture under `tests/fixtures/suspect-zones/{lang}/{ai|natural}/*.md`
 carries an `expected_hot` label in its frontmatter. The benchmark runs
 `analyzeText()` (defined in `src/features/index.js`) on the body and
 compares the predicted hot/cold decision against that label. The decision
-follows the 3-signal OR rule from `core/stylometry.md` §16:
+follows the 4-signal OR rule from `core/stylometry.md` §16:
 ```
 paragraph is SUSPECT iff
   burstiness_band == "low"  OR
   MATTR_band == "low"       OR
-  lexicon_density > threshold
+  (lexicon_density > threshold AND lexicon_min_hits is satisfied) OR
+  koDiagnostics.hot == true
 ```
+`burstiness_band` is only assigned when a paragraph has at least three
+sentences; two-sentence CV is recorded for diagnostics but is not stable enough
+to classify a paragraph by itself. For ko/zh/ja, a single lexicon hit is also
+only an audit hint; the default hot threshold requires at least two CJK hits.
+to make the paragraph hot by itself.
+For `lang=ko`, `analyzeText()` also records Korean diagnostic fields:
+`spacing`, `comma`, and `posDiversity` (a suffix-class proxy, not a morphology
+analyzer). They only affect the hot/cold decision through the conservative
+`koDiagnostics` composite: at least four sentences, at least 20 eojeols, fewer than one comma per sentence, regular eojeol length (`CV <= 0.38`), and low suffix-class diversity (`classDiversity <= 0.26`).
 Per-language metrics use `expected_hot=true` as the positive class.
+## Opt-in live rewrite quality
+`npm run quality:live` runs the live-quality runner without calling a model by
+default. The default path scores fixture inputs and marks the live rewrite step
+as skipped, so it is safe for local smoke checks and CI dry-runs.
+```bash
+npm run quality:live
+npm run quality:live -- --json
+```
+To run actual rewrites, opt in explicitly with an OpenAI-compatible provider.
+Use `PATINA_LIVE_*` so this stays a deliberate local/manual probe rather than a
+per-PR network dependency:
+```bash
+PATINA_LIVE=1 \
+PATINA_LIVE_PROVIDER=gemini \
+PATINA_LIVE_API_KEY=... \
+npm run quality:live -- --language ko --limit 1
+```
+Supported live settings:
+- `PATINA_LIVE_PROVIDER` — provider preset (`openai`, `gemini`, `groq`,
+  `kimi`, `moonshot`, `together`).
+- `PATINA_LIVE_API_KEY` — live-run key; falls back to the provider key or
+  `PATINA_API_KEY`.
+- `PATINA_LIVE_MODEL` / `PATINA_LIVE_API_BASE` / `PATINA_LIVE_TIMEOUT_MS`.
+The fixture set lives in `tests/fixtures/live-quality/{en,ko}/*.md` with YAML
+frontmatter (`fixture_id`, `language`, optional `profile`, `anchors`,
+`expected_focus`) plus the body text. The legacy
+`tests/quality/live-fixtures.jsonl` remains loadable via `--fixtures`.
+Live reports are structured JSON or Markdown with:
+- `schema_version`, redacted settings, and policy floors.
+- `before_score` / `after_score` from model-graded `scoreText`.
+- `mps` from `scoreMPS`.
+- `fidelity` from `scoreFidelity`.
+- `pass`, `warn`, `error`, or `skipped` per fixture.
+A live rewrite passes when `after_score <= 30`, MPS is at least 70, fidelity is
+at least 70, and the AI score improved. Missing credentials, provider failures,
+schema failures, and MPS/fidelity floor violations are `error` and exit
+nonzero; AI-score target misses remain `warn` so the report is still usable.
+Keep this out of mandatory CI unless the live model path is deliberately
+allowed, because LLM output is non-deterministic and may incur provider cost.
+## Adversarial MPS fixtures
+`npm run quality:adversarial-mps` validates a small, repo-owned fixture set
+where explicit meaning anchors are preserved but AI-like wording remains. This
+guards against treating MPS as a humanness score.
+```bash
+npm run quality:adversarial-mps
+node scripts/adversarial-mps-report.mjs --check --json
+```
+Inputs live in `tests/quality/adversarial-mps/fixtures.jsonl`; the report is
+written to `docs/research/adversarial-mps.md`. The gate is:
+- anchor-MPS proxy ≥90;
+- deterministic AI score ≥60;
+- no private or scraped source text.
+If this gate passes, the case is intentionally adversarial: meaning survived,
+but style still needs work. Ouroboros selection should prefer candidates that
+pass MPS and lower the AI score, rather than letting high MPS hide
+recurring AI markers.
+## 2025+ rebaseline manifest
+`npm run benchmark:rebaseline` validates the public JSONL manifest scaffold and
+prints matrix coverage. It does not collect text from vendors, call external
+detectors, or turn a small sample into a headline claim.
+```bash
+npm run benchmark:rebaseline
+npm run benchmark:rebaseline:report
+node scripts/rebaseline-summary.mjs --input tests/quality/rebaseline-manifest.example.jsonl --json
+npm run benchmark:rebaseline:intake -- --input artifacts/rebaseline-2025/intake.example.jsonl --dry-run
+npm run benchmark:rebaseline:intake -- --input artifacts/rebaseline-2025/intake.local.example.jsonl --dry-run --require-source-review
+npm run benchmark:rebaseline:web -- --target-per-register 50 --max-per-source 12 --collected-at 2026-05-22
+npm run benchmark:rebaseline:score -- --input artifacts/rebaseline-2025/private/web-human-controls.generated.private.jsonl --output artifacts/rebaseline-2025/human-controls.public.jsonl --scored-at 2026-05-22
+node scripts/rebaseline-summary.mjs --input artifacts/rebaseline-2025/human-controls.public.jsonl --json
+```
+Each row records the source metadata needed by
+`docs/research/2025-rebaseline-plan.md`: `sample_id`, `language`, `class`,
+`register`, `model_family`, `provider`, `model`, `generated_at`, `prompt_id`,
+`decoding`, `postprocess`, `redistribution`, and `text_hash`. Full `text` is
+allowed only for redistributable rows (`repo-ok`, `redistributable`, public
+license values). Private or vendor-copied rows must stay metadata-only and use
+hashes.
+For local/private corpus intake, use `npm run benchmark:rebaseline:intake`.
+It computes missing `text_hash` values and writes a public manifest that strips
+full text from non-redistributable rows while preserving the full row in the
+gitignored private output. Use `--require-source-review` before pilot reports so
+non-public rows must explain their redistribution status through `source_review`
+or `reviewer_notes`. The tracked `artifacts/rebaseline-2025/intake.example.jsonl`
+fixture and `artifacts/rebaseline-2025/intake.local.example.jsonl` 25-row
+template are smoke checks only; real corpus rows stay local until a license
+review says otherwise.
+`artifacts/rebaseline-2025/human-controls.public.jsonl` is the first tracked
+web-sourced Korean human-control candidate manifest. It is metadata/hash-only:
+no raw source text is committed. Its deterministic outcome fields are register-stratified false-positive
+evidence; public catch-rate claims require positive AI-like rows and claim-cell coverage, now provided by `rebaseline-2026.scored.public.jsonl` for KO+EN.
+The #155 report is claim-ready only when the process gate is satisfied: scored outcome rows, at least three generator families across at least two languages, n≥100 per claim cell, and confidence intervals. The checked-in 2026 manifest now satisfies that gate for KO+EN.
+`npm run benchmark:rebaseline:report` refreshes
+`docs/benchmarks/rebaseline-latest.md` and `.json`. Use `tests/quality/rebaseline-manifest.example.jsonl` for a BLOCKED smoke fixture; use `artifacts/rebaseline-2025/rebaseline-2026.scored.public.jsonl` for the current READY public report.
+## Score vs signal strength
+The pre-commit prose gate keeps the older, conservative score semantics:
+```text
+score = hot_paragraphs / total_paragraphs * 100
+```
+That binary ratio decides pass/fail because it is stable for CI. The report also
+prints two diagnostics:
+- `signal` — average paragraph intensity of the strongest deterministic trigger:
+  how far burstiness or MATTR is inside its low band, how far lexicon density
+  is over the threshold, or how strong the Korean diagnostic composite is.
+- `pattern hits` — count of pattern-pack watch terms found in the stripped prose.
+  This is diagnostic only; it helps reviewers see pattern-level cleanup that may
+  not change the binary hot-paragraph ratio.
+Treat both as editing diagnostics, not separate authorship verdicts or CI gates.
+The prose gate uses the default deterministic thresholds and the current
+Markdown pattern packs. Runtime scoring may use project config thresholds, so
+compare `signal` values within the same entrypoint rather than across tools.
+Report person-written paragraphs that cross the gate through the false-positive
+form: <https://github.com/devswha/patina/issues/new?template=false_positive.yml>.
+Include the exact paragraph, language/register, score output, and whether the
+sample can become a public fixture.
 ## What it does NOT measure
 - LLM-based scoring (`src/scoring.js`). The LLM is non-deterministic by
   design and adds API cost / latency, so it stays out of this layer.
   A separate live-mode benchmark would be its own follow-up.
-- Rewrite quality (does the rewritten text read better?). That requires
-  human or LLM grading and lives in `tests/e2e/quality-test.js`.
-  That script is opt-in because it shells out to OpenCode:
+- Mandatory rewrite quality gates. Live rewrite quality lives in
+  `tests/quality/live-quality.mjs` and remains opt-in because it can shell out
+  to OpenCode:
   ```bash
-  OPENCODE_AVAILABLE=1 node tests/e2e/quality-test.js
+  OPENCODE_AVAILABLE=1 npm run quality:live -- --limit 1
   ```
-  The script uses `opencode/hy3-preview-free` by default. Override it with
+  The scaffold uses `opencode/hy3-preview-free` by default. Override it with
   `OPENCODE_MODEL=<provider/model>` when testing another OpenCode model.
-- AUROC against a ranked score — the current decision is binary
-  (hot/cold), so we report accuracy + F1 instead.
+- Generalized model-era detector claims. The report now includes
+  `signal_score` ranking diagnostics (ROC-AUC, PR-AUC, best-F1 threshold), but
+  those numbers are still limited to the checked-in fixture corpus.
 ## Extending the corpus
@@ -111,11 +271,28 @@ in `.patina.default.yaml` (`stylometry.burstiness.bands`,
 classification. Sweep against this benchmark + your own corpus and
 update thresholds; the shipped values come from the v3.5.1 / v3.7
 calibration documented in `core/stylometry.md` §13 §16.
+`stylometry.ko_diagnostics.bands` controls the ko-only composite. The private
+KatFish calibration command below reports aggregate catch-rate and FP deltas
+without committing external raw text:
+```bash
+npm run benchmark:katfish-ko -- --write --basename katfish-ko-latest
+```
+Treat that report as a KO diagnostic calibration artifact, not as a broad public
+performance claim.
+`npm run benchmark:report` also records a diagnostic `signal_score` sweep. The
+prediction rule is `signal_score >= threshold`, and the PR-AUC value is average
+precision over descending score groups. Use it to compare tuning candidates, not
+as an authorship verdict.
 ## Languages
 Currently runs on all supported pattern-pack languages: `ko`, `en`, `zh`, and
 `ja`. Chinese and Japanese use a deterministic character-token fallback because
 normal prose often has no whitespace; ko/en keep whitespace tokenization.
-Language-specific zh/ja lexicons are still future work, so current zh/ja
-fixtures are mainly burstiness/MATTR regression coverage.
+Korean additionally emits dependency-free spacing/comma/suffix-diversity
+diagnostics and a conservative ko-only composite detector.
+zh/ja now include high-precision AI-lexicon fixtures as well as
+burstiness/MATTR regression coverage.

package/tests/quality/adversarial-mps/fixtures.jsonl ADDED Viewed

@@ -0,0 +1,10 @@
+{"id":"adv-mps-ko-01","lang":"ko","register":"marketing","original":"워크스페이스는 회의록, 할 일, 프로젝트 일정을 한 곳에서 관리한다. 템플릿은 30개이며 복제해서 수정할 수 있다.","rewritten":"이 워크스페이스는 회의록과 할 일, 프로젝트 일정을 통합적으로 관리할 수 있는 혁신적인 생산성 솔루션이자 핵심 가치로 자리매김합니다. 다양한 업무 맥락에서 템플릿 30개를 활용할 수 있으며, 사용자는 이를 복제해서 수정함으로써 업무 효율성을 극대화할 수 있습니다.","anchors":["회의록","할 일","프로젝트 일정","템플릿 30개","복제해서 수정"],"register_note":"Preserves facts but re-adds clustered marketing language."}
+{"id":"adv-mps-ko-02","lang":"ko","register":"technical","original":"배치 작업은 매일 02시에 실행된다. 실패하면 슬랙 알림을 보내고 재시도는 세 번으로 제한한다.","rewritten":"본 배치 작업은 매일 02시에 안정적으로 실행되도록 설계되어 있으며, 실패 상황에서는 슬랙 알림을 통해 즉각적인 대응을 지원합니다. 또한 재시도는 세 번으로 제한하는 체계적인 운영 방식으로 전체 운영 신뢰성을 향상시키는 핵심 원칙으로 자리매김합니다.","anchors":["매일 02시","슬랙 알림","재시도는 세 번"],"register_note":"Operational facts preserved; AI register remains."}
+{"id":"adv-mps-ko-03","lang":"ko","register":"academic","original":"실험에는 저장소 60개가 포함됐다. 평균 설정 시간은 72시간에서 10분으로 줄었고, 표본이 작아 일반화에는 주의가 필요하다.","rewritten":"본 실험은 저장소 60개를 대상으로 수행되었으며, 평균 설정 시간이 72시간에서 10분으로 감소했다는 점에서 후속 논의의 핵심 기반이자 중요한 의미를 지닙니다. 다만 표본이 작기 때문에 결과를 일반화하는 데에는 신중한 접근이 필요하다고 할 수 있습니다.","anchors":["저장소 60개","72시간에서 10분","표본이 작","일반화"],"register_note":"High MPS with clustered academic packaging."}
+{"id":"adv-mps-ko-04","lang":"ko","register":"product-doc","original":"대시보드는 CSV 내보내기를 지원한다. 필터는 팀, 기간, 상태 세 가지다.","rewritten":"이 대시보드는 CSV 내보내기를 지원함으로써 데이터 활용성을 높이는 데 기여합니다. 또한 팀, 기간, 상태 세 가지 필터를 제공하여 사용자가 다양한 관점에서 정보를 효율적으로 탐색할 수 있는 업무 생태계와 핵심 운영 양상을 제공합니다.","anchors":["CSV 내보내기","팀","기간","상태","세 가지 필터"],"register_note":"Product-doc facts preserved; support/efficiency wording recurs."}
+{"id":"adv-mps-ko-05","lang":"ko","register":"policy","original":"신청 기간은 6월 1일부터 6월 14일까지다. 개인은 온라인 양식으로 접수하고, 결과는 7월 3일에 공개된다.","rewritten":"본 신청 기간은 6월 1일부터 6월 14일까지로 운영되며, 개인은 온라인 양식을 통해 접수할 수 있습니다. 결과는 7월 3일에 공개될 예정으로, 신청자는 해당 일정을 사전에 확인하는 것이 핵심이며 안정적인 접수 운영에 중요한 의미를 지닙니다.","anchors":["6월 1일부터 6월 14일까지","온라인 양식","7월 3일"],"register_note":"Dates preserved; officialese packaging remains."}
+{"id":"adv-mps-en-01","lang":"en","register":"marketing","original":"The app imports invoices, groups them by client, and exports a CSV summary at the end of each month.","rewritten":"The app provides a seamless workflow that imports invoices, groups them by client, and exports a CSV summary at the end of each month. This streamlined experience empowers teams to unlock more actionable monthly reporting without changing the underlying billing process.","anchors":["imports invoices","groups them by client","exports a CSV summary","end of each month"],"register_note":"Meaning preserved with dense AI-favored vocabulary."}
+{"id":"adv-mps-en-02","lang":"en","register":"technical","original":"The cache expires after 24 hours. Users can force a manual refresh when debugging stale responses.","rewritten":"The cache is designed as a robust framework that expires after 24 hours while still enabling users to force a manual refresh when debugging stale responses. This approach offers a scalable and thoughtful balance between performance and developer control.","anchors":["expires after 24 hours","manual refresh","debugging stale responses"],"register_note":"Exact controls preserved; AI-like abstraction added."}
+{"id":"adv-mps-en-03","lang":"en","register":"academic","original":"The survey covered 42 maintainers. Twenty-nine said review latency was the main blocker, but the sample was self-selected.","rewritten":"The survey covered 42 maintainers and surfaced a compelling insight: 29 respondents identified review latency as the main blocker. However, because the sample was self-selected, the findings should be interpreted through a nuanced and ethical research lens.","anchors":["42 maintainers","29 respondents","review latency","self-selected"],"register_note":"Numbers and caveat preserved with AI-signature phrasing."}
+{"id":"adv-mps-en-04","lang":"en","register":"support","original":"Password reset links expire in 15 minutes. If a user requests another link, the older link stops working.","rewritten":"Password reset links expire in 15 minutes, creating a secure and user-friendly experience. If a user requests another link, the older link stops working, which helps align the reset workflow with modern account-safety expectations.","anchors":["expire in 15 minutes","requests another link","older link stops working"],"register_note":"Security behavior preserved; packaged UX framing added."}
+{"id":"adv-mps-en-05","lang":"en","register":"strategy","original":"The team will cut weekly planning from 90 minutes to 45 minutes and keep Friday demos unchanged.","rewritten":"The team will streamline weekly planning from 90 minutes to 45 minutes while keeping Friday demos unchanged. This targeted adjustment can accelerate decision-making, bolster alignment, and create a more sustainable operating rhythm without disrupting the existing demo cadence.","anchors":["weekly planning","90 minutes to 45 minutes","Friday demos unchanged"],"register_note":"Schedule facts preserved; AI-favored strategy language remains."}

package/tests/quality/benchmark.mjs CHANGED Viewed

@@ -16,6 +16,8 @@ import yaml from 'js-yaml';
 import { analyzeText } from '../../src/features/index.js';
 import { loadLexicon } from '../../src/features/lexicon.js';
+import { summarizeSignalStrength } from '../../src/features/signal-strength.js';
+import { summarizeRanking } from './ranking-metrics.mjs';
 const __dirname = dirname(fileURLToPath(import.meta.url));
 const REPO_ROOT = resolve(__dirname, '../..');
@@ -89,6 +91,29 @@ function summarize(m) {
   };
 }
+function rankingRecords(fixtures) {
+  return fixtures.map((fixture) => ({
+    score: fixture.signal_score,
+    expected: fixture.expected_hot,
+  }));
+}
+function summarizeRankingByLanguage(fixtures) {
+  const byLanguage = {};
+  for (const fixture of fixtures) {
+    byLanguage[fixture.lang] ||= [];
+    byLanguage[fixture.lang].push({
+      score: fixture.signal_score,
+      expected: fixture.expected_hot,
+    });
+  }
+  return Object.fromEntries(
+    Object.entries(byLanguage)
+      .sort(([a], [b]) => a.localeCompare(b))
+      .map(([lang, records]) => [lang, summarizeRanking(records)])
+  );
+}
 function round(n, digits = 3) {
   return Math.round(n * 10 ** digits) / 10 ** digits;
 }
@@ -108,6 +133,7 @@ function wilsonInterval(successes, n, z = 1.959963984540054) {
 function detectorHot(result) {
   return {
     burstiness: result.paragraphs.some((p) => p.burstiness?.band === 'low'),
+    koDiagnostics: result.paragraphs.some((p) => p.koDiagnostics?.hot),
     mattr: result.paragraphs.some((p) => p.mattr?.band === 'low'),
     lexicon: result.paragraphs.some((p) => p.lexicon?.hot),
   };
@@ -116,6 +142,7 @@ function detectorHot(result) {
 function emptyDetectorMetrics() {
   return {
     burstiness: emptyMetrics(),
+    koDiagnostics: emptyMetrics(),
     mattr: emptyMetrics(),
     lexicon: emptyMetrics(),
   };
@@ -214,6 +241,10 @@ function main() {
       mattr_band: p.mattr?.band,
       lexicon_density: round(p.lexicon?.density ?? 0),
       lexicon_hits: p.lexicon?.hits ?? [],
+      ko_diagnostics_hot: Boolean(p.koDiagnostics?.hot),
+      ko_diagnostics_reasons: p.koDiagnostics?.reasons ?? [],
+      ko_diagnostics_strength: round(p.koDiagnostics?.strength ?? 0),
+      signal_score: round(summarizeSignalStrength(result.paragraphs)),
     };
     const pinned = expectedRanges[meta.fixture_id];
     if (!pinned) {
@@ -253,7 +284,7 @@ function main() {
   const overallCi = wilsonInterval(totalCorrect, totalCount);
   const results = {
-    schemaVersion: 2,
+    schemaVersion: 3,
     fixtureSchemaVersion: FIXTURE_SCHEMA_VERSION,
     nodeVersion: process.version,
     generatedAt: new Date().toISOString(),
@@ -267,6 +298,12 @@ function main() {
       confidence_method: 'Wilson score interval, 95%',
     },
     perLanguage: summary,
+    ranking: {
+      note: 'Signal-score ranking over the checked-in fixture corpus; diagnostic only, not a public generalization claim.',
+      score: 'signal_score from the strongest deterministic paragraph trigger, averaged per fixture',
+      overall: summarizeRanking(rankingRecords(fixtureLog)),
+      perLanguage: summarizeRankingByLanguage(fixtureLog),
+    },
     fixtures: fixtureLog,
   };
@@ -277,6 +314,7 @@ function main() {
   if (!quiet) {
     console.log(`# Quality benchmark — ${fixtureLog.length} fixtures`);
     console.log(`Overall accuracy: ${(overallAccuracy * 100).toFixed(1)}%`);
+    console.log(`Signal ROC-AUC: ${results.ranking.overall.roc_auc.toFixed(3)} · PR-AUC: ${results.ranking.overall.pr_auc.toFixed(3)} · best-F1 threshold: ${results.ranking.overall.bestF1.threshold}`);
     console.log();
     console.log('| lang | n | accuracy | precision | recall | f1 | TP | FP | FN | TN |');
     console.log('|------|---|----------|-----------|--------|----|----|----|----|----|');

package/tests/quality/dogfood.mjs CHANGED Viewed

@@ -21,6 +21,8 @@ const TARGETS = [
   { file: 'README_ZH.md', lang: 'zh' },
   { file: 'README_JA.md', lang: 'ja' },
   { file: 'docs/FAQ.md', lang: 'en' },
+  { file: 'docs/social/signs-of-ai-writing.md', lang: 'en' },
+  { file: 'docs/social/signs-of-ai-writing_KR.md', lang: 'ko' },
   { file: 'SKILL.md', lang: 'ko' },
 ];
@@ -31,10 +33,10 @@ function scoreFile({ file, lang }) {
 const rows = TARGETS.map(scoreFile);
 console.log('# Dogfood docs score');
-console.log('| file | lang | paragraphs | hot | score | threshold |');
-console.log('|---|---|---:|---:|---:|---:|');
+console.log('| file | lang | paragraphs | hot | score | signal | pattern hits | threshold |');
+console.log('|---|---|---:|---:|---:|---:|---:|---:|');
 for (const r of rows) {
-  console.log(`| ${r.file} | ${r.lang} | ${r.paragraphCount} | ${r.hotCount} | ${r.score.toFixed(1)} | ${THRESHOLD} |`);
+  console.log(`| ${r.file} | ${r.lang} | ${r.paragraphCount} | ${r.hotCount} | ${r.score.toFixed(1)} | ${r.signalScore.toFixed(1)} | ${r.patternHits} | ${THRESHOLD} |`);
 }
 const failures = rows.filter((r) => r.score > THRESHOLD);

package/tests/quality/live-fixtures.jsonl ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ {"fixture_id":"en-coffee-public-docs-01","language":"en","register":"public-docs","source_type":"synthetic-ai","model_family":"fixture","prompt_id":"live-quality-v1","redistribution":"repo-ok","facts":["coffee","Paris","Tokyo","coffee shops","climate change"],"text":"Coffee has emerged as a pivotal cultural phenomenon that has fundamentally transformed social interactions across the globe. This beloved beverage serves as a catalyst for community building, fosters meaningful connections, and facilitates cross-cultural dialogue. From the bustling cafés of Paris to the serene tea houses repurposed for coffee in Tokyo, this remarkable journey showcases the innovative spirit of human culinary exploration.\n\nThe proliferation of coffee shops in urban centers has created unprecedented opportunities for social engagement. Patrons from diverse backgrounds converge in these spaces, united by their shared appreciation for this aromatic brew. Furthermore, the ritual of coffee consumption has transcended mere sustenance, evolving into a cornerstone of modern social etiquette.\n\nIndustry experts agree that the coffee sector will continue its growth trajectory. Despite challenges related to climate change and supply chain disruptions, the future remains bright. This beverage will maintain its position as an indispensable component of global culture."}
2	+ {"fixture_id":"ko-coffee-public-docs-01","language":"ko","register":"public-docs","source_type":"synthetic-ai","model_family":"fixture","prompt_id":"live-quality-v1","redistribution":"repo-ok","facts":["커피","서울","부산","기후 변화","공급망"],"text":"커피는 현대 사회적 상호작용을 근본적으로 변화시킨 핵심적인 문화 현상으로 자리매김하고 있습니다. 이 음료는 공동체 형성을 촉진하고 의미 있는 연결을 가능하게 하며, 다양한 문화권 사이의 대화를 활성화하는 중요한 매개체로 기능합니다. 서울의 번화한 카페 거리부터 부산의 조용한 로스터리까지, 커피 문화는 인간의 창의적 식문화를 보여주는 대표적인 사례입니다.\n\n도시 중심부에서 커피 전문점이 확산되면서 사회적 참여를 위한 전례 없는 기회가 창출되고 있습니다. 다양한 배경의 고객들은 이 향기로운 음료에 대한 공통된 선호를 바탕으로 한 공간에 모입니다. 나아가 커피 소비 의례는 단순한 기호식품을 넘어 현대적 생활양식의 중요한 구성 요소로 진화했습니다.\n\n업계 전문가들은 커피 산업이 앞으로도 성장 궤도를 유지할 것이라고 봅니다. 기후 변화와 공급망 불안이라는 과제가 존재하지만, 시장의 미래는 여전히 밝다고 평가됩니다. 커피는 세계 문화의 필수적인 요소로 계속 자리할 것입니다."}