agent-write-gate 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,276 @@
1
+ Metadata-Version: 2.4
2
+ Name: agent-write-gate
3
+ Version: 0.1.0
4
+ Summary: Agent-hook safety gate for AI-written code -- blocks bidi/invisible Unicode and CJK corruption at the write boundary
5
+ License: MIT
6
+ Keywords: agent,hook,safety,unicode,cjk,llm,claude-code,codex,pre-commit
7
+ Classifier: Development Status :: 3 - Alpha
8
+ Classifier: Environment :: Console
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3.9
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Software Development :: Quality Assurance
17
+ Classifier: Topic :: Security
18
+ Requires-Python: >=3.9
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: tomli>=1.1.0; python_version < "3.11"
22
+ Provides-Extra: cjk
23
+ Requires-Dist: mojihen>=0.1; extra == "cjk"
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest; extra == "dev"
26
+ Dynamic: license-file
27
+
28
+ # agentgate
29
+
30
+ An agent-hook safety gate for AI-written code. Wire it into Claude Code or Codex
31
+ as a hook; it runs a battery of deterministic checks on the text an agent is about
32
+ to write (PreToolUse) or just wrote (PostToolUse) and either blocks the write or
33
+ hands the model a structured feedback blob so the loop self-corrects without a
34
+ human round-trip.
35
+
36
+ ## Problem
37
+
38
+ AI coding agents write files directly. They introduce defect classes that
39
+ traditional linters and CI were not built to catch:
40
+
41
+ - **Valid-but-wrong CJK** -- a real but wrong kanji/hanzi/hangul (mojihen's
42
+ domain; grep and unit tests pass it as false-green).
43
+ - **Bidi / invisible Unicode** -- Trojan-Source bidi overrides and invisible
44
+ characters smuggled into source identifiers.
45
+
46
+ Both have standalone CLIs that run in CI. What is missing is a single hook at
47
+ the agent write boundary that (a) speaks the agents' hook protocols, (b) applies
48
+ one severity-to-action policy across checks, and (c) returns a model-readable
49
+ feedback blob so the agent rewrites itself.
50
+
51
+ ## Positioning (honest)
52
+
53
+ agentgate is not a new category -- Claude Code and Codex already expose hook
54
+ frameworks. agentgate is the packaged, cross-agent gate that bundles
55
+ AI-write-specific checks, normalizes the two agents' payloads, centralizes
56
+ block/warn policy, and ships an open check registry so the set grows without
57
+ forking. The ecosystem value is the gate + registry standard, plus composing the
58
+ sibling engine mojihen, not any single linter.
59
+
60
+ ## Coverage limits
61
+
62
+ agentgate only sees writes that flow through a hooked tool call. It does NOT see,
63
+ and makes no claim about:
64
+
65
+ - Files written by shell commands the agent runs (`echo >`, `sed -i`, codegen).
66
+ - Pre-existing files, generated artifacts, or out-of-band edits.
67
+ - Editor/IDE agents that do not emit the supported hook payloads.
68
+ - Codex tool paths other than `apply_patch` (best-effort; fail-open otherwise).
69
+ - Anything a PostToolUse hook is asked to undo -- it cannot roll back a completed
70
+ write, only report it back to the model.
71
+
72
+ ## M1 built-in checks
73
+
74
+ ### unicode (stdlib, always available)
75
+
76
+ - **AG-BIDI** (high, always block): bidi control characters U+202A-U+202E,
77
+ U+2066-U+2069. These are the Trojan-Source vectors with essentially no
78
+ legitimate use in source files.
79
+ - **AG-INVIS** (high, code context only): invisible chars (U+200B zero-width
80
+ space, U+2060 word joiner, U+FEFF stray BOM, U+00AD soft hyphen) flagged only
81
+ when the file has a code extension AND the char is inside an identifier/string
82
+ run. U+200C/U+200D (ZWNJ/ZWJ) and U+200E/U+200F (LRM/RLM) are NOT flagged
83
+ by default -- they are legitimate in Arabic/Persian/Indic text and emoji ZWJ
84
+ sequences. Strict mode (`unicode.strict_zerowidth = true`) adds ZWNJ/ZWJ only
85
+ inside ASCII-identifier runs.
86
+ - **AG-HOMO** (medium, opt-in): Latin-looking Cyrillic/Greek codepoints inside
87
+ an otherwise-ASCII identifier. Off by default.
88
+
89
+ ### cjk (requires mojihen extra)
90
+
91
+ Embeds `mojihen.detect.run_detectors` to catch known LLM CJK corruption: a
92
+ real but wrong kanji/hanzi that grep and unit tests pass as false-green (rule
93
+ MH001 and others). Disabled by default so the stdlib-only core installs and runs
94
+ out of the box.
95
+
96
+ ## Install
97
+
98
+ Core (stdlib-only, unicode check):
99
+
100
+ ```sh
101
+ pip install agent-write-gate
102
+ ```
103
+
104
+ With CJK check (requires mojihen):
105
+
106
+ ```sh
107
+ pip install agent-write-gate[cjk]
108
+ ```
109
+
110
+ Then enable in config:
111
+
112
+ ```toml
113
+ # agentgate.toml
114
+ [checks]
115
+ cjk = true
116
+ ```
117
+
118
+ ## CLI
119
+
120
+ ```sh
121
+ # Primary: agent hook entrypoint (pipe JSON from agent hook system)
122
+ agentgate hook --stdin
123
+
124
+ # Scan files (CI / manual audit)
125
+ agentgate scan src/ --format tty|json|sarif
126
+
127
+ # List checks and their status
128
+ agentgate checks
129
+
130
+ # Version
131
+ agentgate --version
132
+ ```
133
+
134
+ Exit codes:
135
+ - `hook`: 0 = allow; 2 = block (deny in Pre / feedback in Post) or error.
136
+ - `scan`: 0 = no blocking findings; 1 = blocking findings; 2 = error.
137
+
138
+ ## Hook setup: Claude Code
139
+
140
+ Add to `.claude/settings.json`:
141
+
142
+ ```json
143
+ {
144
+ "hooks": {
145
+ "PreToolUse": [
146
+ {
147
+ "matcher": "Write|Edit",
148
+ "hooks": [{"type": "command", "command": "agentgate hook --stdin"}]
149
+ }
150
+ ]
151
+ }
152
+ }
153
+ ```
154
+
155
+ **PreToolUse** (exit 2) denies the tool call -- the write never happens.
156
+ **PostToolUse** (exit 2) surfaces feedback to the model after the write.
157
+ Only PreToolUse provides true prevention. See `hooks/claude-code.md`.
158
+
159
+ ## Hook setup: Codex
160
+
161
+ Only `apply_patch` is supported (Codex's primary write tool):
162
+
163
+ ```json
164
+ {
165
+ "hooks": {
166
+ "PreToolUse": [
167
+ {
168
+ "matcher": "apply_patch",
169
+ "hooks": [{"type": "command", "command": "agentgate hook --stdin"}]
170
+ }
171
+ ]
172
+ }
173
+ }
174
+ ```
175
+
176
+ See `hooks/codex.md` for PostToolUse setup and coverage limits.
177
+
178
+ ## pre-commit
179
+
180
+ ```yaml
181
+ # .pre-commit-config.yaml
182
+ repos:
183
+ - repo: https://github.com/hryoma1217/agentgate
184
+ rev: v0.1.0
185
+ hooks:
186
+ - id: agentgate
187
+ ```
188
+
189
+ The bidi / invisible-Unicode checks work out of the box. The CJK corruption
190
+ check is **off by default**; if you enable it (`[checks.cjk] enabled = true`), the
191
+ hook also needs `mojihen` in pre-commit's isolated env — add it via
192
+ `additional_dependencies`:
193
+
194
+ ```yaml
195
+ - id: agentgate
196
+ additional_dependencies: ["agent-write-gate[cjk]"]
197
+ ```
198
+
199
+ ## Configuration
200
+
201
+ `agentgate.toml` or `[tool.agentgate]` in `pyproject.toml`. Falls back to
202
+ defaults when no config file is found.
203
+
204
+ ```toml
205
+ # Unicode checks are ON by default; the CJK check is OFF by default.
206
+ # Enable CJK only if you installed the `agent-write-gate[cjk]` extra.
207
+ [checks.cjk]
208
+ enabled = false
209
+ min_confidence = "high"
210
+
211
+ [checks.unicode]
212
+ enabled = true
213
+ homoglyph = false
214
+ strict_zerowidth = false
215
+ allow_bidi_suppression = false
216
+ code_extensions = [
217
+ ".py", ".js", ".ts", ".go", ".rs", ".java",
218
+ ".c", ".cpp", ".rb", ".php", ".sh", ".sql"
219
+ ]
220
+
221
+ [policy]
222
+ high = "block"
223
+ medium = "warn"
224
+ low = "ignore"
225
+ ```
226
+
227
+ ## Suppression
228
+
229
+ Rule-specific only: `agentgate: ignore[AG-INVIS]` on the offending line.
230
+ **There is no bare `agentgate: ignore`** -- that would let a model launder
231
+ violations. AG-BIDI is not suppressible unless `allow_bidi_suppression = true`.
232
+
233
+ ## Model-readable block report
234
+
235
+ When a write is blocked, stderr looks like:
236
+
237
+ ```
238
+ agentgate: BLOCKED -- 2 issue(s) to fix before this write
239
+
240
+ app.py:3:18 cjk/MH001 HIGH '闾' -> likely: 閾
241
+ '闾' is a known LLM corruption (likely intended: 閾) ...
242
+ app.py:5:1 unicode/AG-BIDI HIGH U+202E RIGHT-TO-LEFT OVERRIDE
243
+ Remove the bidi control char; it visually reorders source.
244
+
245
+ Fix these and re-emit.
246
+ ```
247
+
248
+ The model sees this as a structured remediation signal and issues a corrective
249
+ write without human intervention.
250
+
251
+ ## Relationship to mojihen
252
+
253
+ mojihen is the CJK engine (its own PyPI package, independently useful).
254
+ agentgate is the cross-agent gate that composes it (optional extra
255
+ `agent-write-gate[cjk]`) with the stdlib Unicode-safety check, under one policy and
256
+ one model-readable feedback contract. Two focused packages; the gate + open
257
+ registry is the ecosystem layer.
258
+
259
+ ## Open registry
260
+
261
+ Third-party checks can be registered:
262
+
263
+ ```python
264
+ from agentgate.registry import register
265
+
266
+ def my_check(event, cfg):
267
+ # event: WriteEvent, cfg: GateConfig
268
+ # return List[Issue]
269
+ return []
270
+
271
+ register("my-check", my_check)
272
+ ```
273
+
274
+ ## License
275
+
276
+ MIT
@@ -0,0 +1,18 @@
1
+ agent_write_gate-0.1.0.dist-info/licenses/LICENSE,sha256=EFPLivHeuSpMRblwoRojAy4LPIQWwUUv5AloeiXURK8,1067
2
+ agentgate/__init__.py,sha256=FSXOZSQw9IjapLmsC49eP4vrqIp1pYE4NLyHtF3_gjM,122
3
+ agentgate/adapter.py,sha256=rSqNJEpwPfGhZ8-BVVKVeG11VToHMhExQOrK2qAgn28,4819
4
+ agentgate/apply_patch.py,sha256=Y7YF917EQBDIVCoJz7Vyt_GwVSo8e-jiL1_y9C0LD3g,2368
5
+ agentgate/cli.py,sha256=N6eyN-afmQHslzbkXwCyoFEYlUjtyyBxqpxo94lTZMk,17599
6
+ agentgate/config.py,sha256=ZLYk9UQl9kVsx_1X8xLkerXm2pm6A9Kc6HMWfu5G4Xg,5843
7
+ agentgate/model.py,sha256=7YRVsmQT9duIja2Cn3XXPjExqc501tYQX3tPJ747Na0,939
8
+ agentgate/policy.py,sha256=UlvueIacLdcjYTQk4XKknRy2z54FddDRlhR3gvUDhhg,2945
9
+ agentgate/registry.py,sha256=XmDMBmNLNs_HRhc3LIPN1MJPELM5mLp3nXEowk3tVpU,1859
10
+ agentgate/report.py,sha256=H0jmFhVFIMxMWSjEmeDZTtsyw1_sSlZs6BZefE-uI5g,7866
11
+ agentgate/checks/__init__.py,sha256=z1LFE7PdlelmBfqVFGt-GECPwlfl79keKZsIpEYFdlQ,50
12
+ agentgate/checks/cjk.py,sha256=IjFlJ21NNeABHwpeIab2xQfTE0-S9qSnjuc9GpvSYQA,2547
13
+ agentgate/checks/unicode_safety.py,sha256=_920jV6YW3B5Blyjiv3nNHsuGGhHytPFn77qjTIKW6A,10846
14
+ agent_write_gate-0.1.0.dist-info/METADATA,sha256=TINKwXOPM7vIX31RBGVAE2FfrUvA2vyksh2W4PPc_Qc,8589
15
+ agent_write_gate-0.1.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
16
+ agent_write_gate-0.1.0.dist-info/entry_points.txt,sha256=Gfb2Xz_dlF6OF6MhPrV3mmWri0gFeZC0DWVHmGiXOhM,49
17
+ agent_write_gate-0.1.0.dist-info/top_level.txt,sha256=qWUC8JQwEvSXwMD-j8F0kB0LIDQhef7EPRPGeosKWb4,10
18
+ agent_write_gate-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (82.0.1)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ agentgate = agentgate.cli:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 hryoma1217
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ agentgate
agentgate/__init__.py ADDED
@@ -0,0 +1,5 @@
1
+ """agentgate -- agent-hook safety gate for AI-written code."""
2
+
3
+ from __future__ import annotations
4
+
5
+ __version__ = "0.1.0"
agentgate/adapter.py ADDED
@@ -0,0 +1,142 @@
1
+ """adapter.py -- Normalize stdin JSON hook payloads into WriteEvent.
2
+
3
+ Supports three payload shapes:
4
+ 1. Claude Code: {hook_event_name, tool_name, tool_input:{file_path, content|new_string}}
5
+ 2. Codex apply_patch: tool_input.command containing *** Begin Patch envelope
6
+ 3. Generic fallback: {file_path|path, content|text|new_string} at top level or tool_input
7
+
8
+ Never raises on bad input -- tolerant extraction throughout.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import json
14
+ from typing import Any, Dict, Optional
15
+
16
+ from .apply_patch import parse_apply_patch
17
+ from .model import WriteEvent
18
+
19
+
20
+ def _extract_content(obj: Dict[str, Any]) -> str:
21
+ """Extract content text from a tool_input-like dict."""
22
+ return (
23
+ obj.get("content")
24
+ or obj.get("new_string")
25
+ or obj.get("text")
26
+ or ""
27
+ )
28
+
29
+
30
+ def _extract_path(obj: Dict[str, Any]) -> str:
31
+ """Extract file path from a tool_input-like dict."""
32
+ return (
33
+ obj.get("file_path")
34
+ or obj.get("path")
35
+ or "<stdin>"
36
+ )
37
+
38
+
39
+ def _parse_phase(hook_event_name: Optional[str]) -> str:
40
+ if not hook_event_name:
41
+ return "unknown"
42
+ hen = hook_event_name.lower()
43
+ if "pre" in hen:
44
+ return "pre"
45
+ if "post" in hen:
46
+ return "post"
47
+ return "unknown"
48
+
49
+
50
+ def from_stdin_json(raw: str) -> WriteEvent:
51
+ """Parse a raw stdin string into a WriteEvent.
52
+
53
+ Returns a WriteEvent with empty content when nothing can be extracted.
54
+ Caller should check event.content and exit 0 when empty (nothing to gate).
55
+ """
56
+ if not raw or not raw.strip():
57
+ return WriteEvent(agent="generic", phase="unknown", tool="unknown",
58
+ file_path="<stdin>", content="")
59
+
60
+ try:
61
+ payload = json.loads(raw)
62
+ except (json.JSONDecodeError, ValueError):
63
+ return WriteEvent(agent="generic", phase="unknown", tool="unknown",
64
+ file_path="<stdin>", content="")
65
+
66
+ if not isinstance(payload, dict):
67
+ return WriteEvent(agent="generic", phase="unknown", tool="unknown",
68
+ file_path="<stdin>", content="")
69
+
70
+ # -----------------------------------------------------------------------
71
+ # 1. Claude Code shape: has hook_event_name + tool_name
72
+ # -----------------------------------------------------------------------
73
+ hook_event_name = payload.get("hook_event_name")
74
+ tool_name = payload.get("tool_name")
75
+
76
+ if hook_event_name and tool_name:
77
+ phase = _parse_phase(hook_event_name)
78
+ tool_input = payload.get("tool_input") or {}
79
+ if not isinstance(tool_input, dict):
80
+ tool_input = {}
81
+
82
+ file_path = _extract_path(tool_input)
83
+ content = _extract_content(tool_input)
84
+
85
+ return WriteEvent(
86
+ agent="claude-code",
87
+ phase=phase,
88
+ tool=str(tool_name),
89
+ file_path=file_path,
90
+ content=content,
91
+ )
92
+
93
+ # -----------------------------------------------------------------------
94
+ # 2. Codex shape: tool_input.command containing apply_patch
95
+ # -----------------------------------------------------------------------
96
+ tool_input = payload.get("tool_input")
97
+ if isinstance(tool_input, dict):
98
+ command = tool_input.get("command")
99
+
100
+ # command may be a string or a list (argv)
101
+ if isinstance(command, list):
102
+ command_str = " ".join(str(c) for c in command)
103
+ elif isinstance(command, str):
104
+ command_str = command
105
+ else:
106
+ command_str = ""
107
+
108
+ if command_str and "*** Begin Patch" in command_str:
109
+ file_path, added_text = parse_apply_patch(command_str)
110
+ return WriteEvent(
111
+ agent="codex",
112
+ phase=_parse_phase(payload.get("hook_event_name")),
113
+ tool="apply_patch",
114
+ file_path=file_path or "<stdin>",
115
+ content=added_text,
116
+ )
117
+
118
+ # tool_input present but not apply_patch -- try generic extraction from tool_input
119
+ file_path = _extract_path(tool_input)
120
+ content = _extract_content(tool_input)
121
+ if content:
122
+ return WriteEvent(
123
+ agent="generic",
124
+ phase=_parse_phase(payload.get("hook_event_name")),
125
+ tool="unknown",
126
+ file_path=file_path,
127
+ content=content,
128
+ )
129
+
130
+ # -----------------------------------------------------------------------
131
+ # 3. Generic / fallback: content at top level
132
+ # -----------------------------------------------------------------------
133
+ file_path = _extract_path(payload)
134
+ content = _extract_content(payload)
135
+
136
+ return WriteEvent(
137
+ agent="generic",
138
+ phase="unknown",
139
+ tool="unknown",
140
+ file_path=file_path,
141
+ content=content,
142
+ )
@@ -0,0 +1,74 @@
1
+ """apply_patch.py -- Parse a Codex apply_patch envelope.
2
+
3
+ Handles the envelope format:
4
+ *** Begin Patch
5
+ *** Add File: path/to/file
6
+ +added line
7
+ +another added line
8
+ *** Update File: path/to/other
9
+ context line
10
+ +added line
11
+ more context
12
+ *** End Patch
13
+
14
+ Returns (path, added_text) where:
15
+ - path is the first Add File or Update File path found
16
+ - added_text is the joined added lines ('+'-prefixed, excluding '+++' diff headers)
17
+
18
+ Returns ("", "") if the envelope is missing or cannot be parsed.
19
+ """
20
+
21
+ from __future__ import annotations
22
+
23
+ from typing import Tuple
24
+
25
+
26
+ def parse_apply_patch(command: str) -> Tuple[str, str]:
27
+ """Parse a Codex apply_patch command string.
28
+
29
+ Returns (file_path, added_text). On any parse failure returns ("", "").
30
+ """
31
+ if not isinstance(command, str):
32
+ return ("", "")
33
+
34
+ # Locate envelope boundaries
35
+ begin_marker = "*** Begin Patch"
36
+ end_marker = "*** End Patch"
37
+
38
+ begin_idx = command.find(begin_marker)
39
+ if begin_idx == -1:
40
+ return ("", "")
41
+
42
+ end_idx = command.find(end_marker, begin_idx)
43
+ if end_idx == -1:
44
+ # Accept unterminated envelope -- take everything after begin
45
+ envelope = command[begin_idx + len(begin_marker):]
46
+ else:
47
+ envelope = command[begin_idx + len(begin_marker):end_idx]
48
+
49
+ lines = envelope.splitlines()
50
+
51
+ file_path = ""
52
+ added_lines = []
53
+
54
+ for line in lines:
55
+ if line.startswith("*** Add File:"):
56
+ candidate = line[len("*** Add File:"):].strip()
57
+ if candidate and not file_path:
58
+ file_path = candidate
59
+ elif line.startswith("*** Update File:"):
60
+ candidate = line[len("*** Update File:"):].strip()
61
+ if candidate and not file_path:
62
+ file_path = candidate
63
+ elif line.startswith("*** Delete File:"):
64
+ # Deletions have no added content; note path but no lines
65
+ candidate = line[len("*** Delete File:"):].strip()
66
+ if candidate and not file_path:
67
+ file_path = candidate
68
+ elif line.startswith("+") and not line.startswith("+++"):
69
+ # Added line: strip the leading '+'
70
+ added_lines.append(line[1:])
71
+ # Context lines (no prefix or space prefix) and removed lines ('-') are ignored
72
+
73
+ added_text = "\n".join(added_lines)
74
+ return (file_path, added_text)
@@ -0,0 +1 @@
1
+ """checks -- Built-in agentgate check modules."""
@@ -0,0 +1,81 @@
1
+ """cjk.py -- CJK corruption check using mojihen.
2
+
3
+ Embeds mojihen.detect.run_detectors per line at cfg.cjk.min_confidence.
4
+ Maps mojihen Finding -> agentgate Issue.
5
+
6
+ If mojihen is not importable AND cjk is enabled, raises RuntimeError with
7
+ a clear install message. This error is caught at startup (never silent).
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ from typing import List, TYPE_CHECKING
13
+
14
+ if TYPE_CHECKING:
15
+ from ..model import WriteEvent, Issue
16
+ from ..config import GateConfig
17
+
18
+ from ..model import Issue
19
+
20
+
21
+ def _require_mojihen():
22
+ """Import mojihen and return (run_detectors, Corpus, load_corpus).
23
+
24
+ Raises RuntimeError with install instructions if mojihen is absent.
25
+ """
26
+ try:
27
+ from mojihen.detect import run_detectors, CONFIDENCE_RANK # type: ignore
28
+ from mojihen.corpus import load_corpus # type: ignore
29
+ return run_detectors, CONFIDENCE_RANK, load_corpus
30
+ except ImportError as exc:
31
+ raise RuntimeError(
32
+ "cjk check enabled but mojihen is not installed.\n"
33
+ " Install it with: pip install agent-write-gate[cjk]\n"
34
+ " Or disable it in config: [checks] cjk = false\n"
35
+ f" (Original error: {exc})"
36
+ ) from exc
37
+
38
+
39
+ def run(event: "WriteEvent", cfg: "GateConfig") -> List["Issue"]:
40
+ """Run CJK corruption check using mojihen. Returns list of Issues."""
41
+ run_detectors, CONFIDENCE_RANK, load_corpus = _require_mojihen()
42
+
43
+ content = event.content
44
+ if not content:
45
+ return []
46
+
47
+ min_confidence = cfg.cjk.min_confidence
48
+
49
+ # Load corpus with defaults
50
+ corpus = load_corpus([]) # empty list = built-in corpus only
51
+ allow_set: set = set()
52
+
53
+ issues: List[Issue] = []
54
+ lines = content.splitlines()
55
+
56
+ for i, line in enumerate(lines, start=1):
57
+ findings = run_detectors(
58
+ line,
59
+ i,
60
+ corpus,
61
+ allow_set,
62
+ min_confidence=min_confidence,
63
+ )
64
+ for f in findings:
65
+ # Build suggestion from intended list
66
+ suggestion = ""
67
+ if f.intended:
68
+ suggestion = "likely: " + "|".join(f.intended)
69
+
70
+ issues.append(Issue(
71
+ check="cjk",
72
+ rule_id=f.rule_id,
73
+ severity="high" if f.confidence == "high" else "medium" if f.confidence == "medium" else "low",
74
+ line=f.line,
75
+ col=f.col,
76
+ message=f.message,
77
+ excerpt=f.run,
78
+ suggestion=suggestion,
79
+ ))
80
+
81
+ return issues