nojibake 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,18 @@
1
+ # Changelog
2
+
3
+ ## 0.1.0
4
+
5
+ - Initial read-only encoding inspection CLI.
6
+ - Added strict UTF-8, UTF-16 BOM, windows-949, binary NUL, EOL, SHA-256, and path-safety reporting.
7
+ - Added JSON-first result envelope and schema command.
8
+ - Added recursive/file-list `scan` command for agent preflight workflows.
9
+ - Added `guard` command with `unsafe`, `ambiguous`, `mixed-eol`, and `non-utf8` policies.
10
+ - Added compact JSON output for token-efficient agent contexts.
11
+ - Added `--stdin-paths` so callers can pipe changed-file lists without Nojibake executing `git`.
12
+ - Added `.nojibakerc.json` support for `maxFiles`, `maxBytes`, `ignore`, `failOn`, and `allowEncodings` defaults.
13
+ - Added compact scan summary counts and standardized reason codes for agent decisions.
14
+ - Added human-readable `--pretty` output for inspect, scan, and guard.
15
+ - Fixed npm bin symlink entrypoint detection for installed CLI execution.
16
+ - Hardened GitHub Actions permissions for public repository use.
17
+ - Expanded CI to Ubuntu, Windows, and macOS on Node 20/22.
18
+ - Added Windows/CJK smoke coverage for PowerShell, CMD, npm-installed bin shims, Hangul file names, CP949, UTF-16LE `.rc`, CRLF stdin path lists, ADS-shaped paths, and mixed EOL.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Nojibake contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,353 @@
1
+ # Nojibake
2
+
3
+ English | [한국어](#한국어)
4
+
5
+ Nojibake is a read-only Node.js CLI for inspecting file bytes before text processing. It reports encoding signals, BOM state, line-ending summary, SHA-256, and safety decisions in a stable JSON envelope.
6
+
7
+ It is designed for agent/Codex/OpenCode preflight checks: inspect metadata before loading file contents into an LLM context or rewriting files that may be CP949, UTF-16, binary, or line-ending sensitive.
8
+
9
+ Nojibake does not modify files. It is a read-only preflight guard for agents and CI, not an automatic encoding converter.
10
+
11
+ ## Install
12
+
13
+ ```sh
14
+ npm install
15
+ npm run build
16
+ ```
17
+
18
+ For a published package later:
19
+
20
+ ```sh
21
+ npx nojibake version --json
22
+ ```
23
+
24
+ ## Commands
25
+
26
+ ```sh
27
+ nojibake version --json
28
+ nojibake schema result --json
29
+
30
+ nojibake inspect path --path ./file.txt --json
31
+ nojibake inspect path --root ./safe-root --path ./file.txt --json --compact
32
+ nojibake inspect path --root ./safe-root --path ./file.txt --pretty
33
+
34
+ nojibake scan --root . --json --compact
35
+ nojibake scan --root . --path README.md --path src/cli.ts --json --compact
36
+ git diff --name-only | nojibake scan --root . --stdin-paths --json --compact
37
+ nojibake scan --root . --pretty
38
+
39
+ nojibake guard --root . --fail-on unsafe --json
40
+ git diff --cached --name-only | nojibake guard --root . --stdin-paths --fail-on unsafe,ambiguous,mixed-eol --json --compact
41
+ nojibake guard --root . --fail-on unsafe,ambiguous,mixed-eol,disallowed-encoding --json --compact
42
+ ```
43
+
44
+ `scan` recursively scans a root directory unless one or more `--path` options are supplied or `--stdin-paths` is used. Recursive scans skip `.git`, `node_modules`, `dist`, and `coverage` by default; pass `--include-ignored` to include them. Use `--max-files <n>` to bound large repositories and `--max-bytes <n>` to avoid reading oversized files.
45
+
46
+ `--stdin-paths` reads newline-separated paths from stdin. This gives you `scan --changed` behavior without letting Nojibake execute `git` or any child process.
47
+
48
+ Project defaults can be stored in `.nojibakerc.json` at the scan root:
49
+
50
+ ```json
51
+ {
52
+ "maxFiles": 5000,
53
+ "maxBytes": 200000,
54
+ "ignore": ["dist/**", "node_modules/**", "*.png"],
55
+ "failOn": ["unsafe", "ambiguous", "mixed-eol"],
56
+ "allowEncodings": ["utf-8", "ascii", "utf-16le", "windows-949"]
57
+ }
58
+ ```
59
+
60
+ `guard` scans the same inputs and exits non-zero when a requested policy fails.
61
+
62
+ Guard policies:
63
+
64
+ - `unsafe`: failed path safety, unreadable file, invalid bytes, binary NUL, or `safeRead: false`
65
+ - `ambiguous`: bytes are valid under more than one non-ASCII candidate
66
+ - `mixed-eol`: CRLF/LF/CR are mixed
67
+ - `non-utf8`: detected encoding is not `utf-8` or `ascii`
68
+ - `disallowed-encoding`: detected encoding is not listed in `allowEncodings`
69
+
70
+ Scan and guard results include standardized machine-readable reason codes such as `read:unsafe`, `encoding:binary`, `encoding:ambiguous`, `encoding:non-utf8`, `encoding:disallowed`, `eol:mixed`, and `large:file`.
71
+
72
+ ## Agent-token workflow
73
+
74
+ Use compact scan output before reading file contents:
75
+
76
+ ```sh
77
+ nojibake scan --root . --path README.md --path src/cli.ts --json --compact
78
+ ```
79
+
80
+ Compact keys are intentionally short:
81
+
82
+ - `p`: path
83
+ - `l`: byte length
84
+ - `e`: encoding
85
+ - `d`: decision
86
+ - `sr`: safeRead
87
+ - `sw`: safeRewrite
88
+ - `mix`: mixed line endings
89
+ - `why`: standardized reason codes
90
+ - `err`: error codes
91
+
92
+ Compact scan summaries include `ok`, `n`, `bytes`, `safe`, `unsafe`, `amb`, `mix`, `err`, `skip`, and short histograms for decisions, encodings, and reason codes.
93
+
94
+ This lets an agent filter files by metadata before spending tokens on full file contents.
95
+
96
+ ## Agent integration
97
+
98
+ Installing Nojibake does not automatically intercept an agent's file reads or writes. It is a CLI guard that agents, hooks, CI jobs, or tool wrappers must call explicitly.
99
+
100
+ Recommended integration layers:
101
+
102
+ 1. **Agent instructions**: put a short Nojibake rule in `AGENTS.md`, `CLAUDE.md`, or another agent rules file.
103
+ 2. **Pre-commit hook**: block unsafe staged changes before they enter git history.
104
+ 3. **CI guard**: reject pull requests that introduce unsafe or policy-disallowed files.
105
+ 4. **Tool wrapper**: for deeper integration, wrap an agent's `read_file`, `write_file`, or `patch` tool so it calls `nojibake inspect` before reading and `nojibake guard` before or after edits.
106
+
107
+ Minimal agent instruction:
108
+
109
+ ```md
110
+ Before reading or editing legacy, Windows, CJK, or unknown-encoding files, run:
111
+
112
+ nojibake scan --root . --path <path> --json --compact
113
+
114
+ Before committing staged changes, run:
115
+
116
+ git diff --cached --name-only | nojibake guard --root . --stdin-paths --fail-on unsafe,ambiguous,mixed-eol,disallowed-encoding --json --compact
117
+
118
+ If Nojibake reports `safeRead: false`, `safeRewrite: false`, `ambiguous`, `mixed-eol`, `windows-949`, or `utf-16le`, do not rewrite the file with normal UTF-8 text tools until the encoding and line-ending strategy is explicit.
119
+ ```
120
+
121
+ Pre-commit hook example:
122
+
123
+ ```sh
124
+ #!/bin/sh
125
+ set -eu
126
+
127
+ git diff --cached --name-only | nojibake guard \
128
+ --root . \
129
+ --stdin-paths \
130
+ --fail-on unsafe,ambiguous,mixed-eol,disallowed-encoding \
131
+ --json --compact
132
+ ```
133
+
134
+ For project-specific policy, add `.nojibakerc.json` and tune `allowEncodings`, `failOn`, `ignore`, `maxFiles`, and `maxBytes`.
135
+
136
+ ## JSON envelope
137
+
138
+ JSON commands emit a JSON-first result envelope:
139
+
140
+ ```json
141
+ {
142
+ "schemaVersion": "1.0.0",
143
+ "toolVersion": "0.1.0",
144
+ "invocationId": "00000000-0000-0000-0000-000000000000",
145
+ "ok": true,
146
+ "command": "version",
147
+ "summary": "Nojibake 0.1.0",
148
+ "data": {},
149
+ "errors": [],
150
+ "warnings": []
151
+ }
152
+ ```
153
+
154
+ ## Inspection output
155
+
156
+ `inspect path` is read-only. It reports:
157
+
158
+ - byte length and SHA-256
159
+ - BOM detection for UTF-8, UTF-16LE, and UTF-16BE
160
+ - strict validation for BOM-confirmed text
161
+ - binary NUL detection
162
+ - strict UTF-8 validation without BOM
163
+ - windows-949 validation by round-trip conversion
164
+ - ambiguous results when more than one non-ASCII candidate is valid
165
+ - EOL counts for CRLF, LF, and CR
166
+ - `safeRead` and `safeRewrite: false`
167
+
168
+ ## Path safety
169
+
170
+ Nojibake rejects missing files, directories, Windows alternate data stream notation, symlink traversal, and files outside an optional root boundary.
171
+
172
+ ## Development
173
+
174
+ ```sh
175
+ npm install
176
+ npm run typecheck
177
+ npm test
178
+ npm run build
179
+ npm run smoke
180
+ npm run smoke:windows
181
+ npm run smoke:installed
182
+ npm run pack:check
183
+ ```
184
+
185
+ CI runs the full suite on Ubuntu, Windows, and macOS with Node 20 and 22. A separate Windows shell job checks PowerShell and CMD behavior with CJK file names and stdin path lists.
186
+
187
+ Repository metadata currently targets `https://github.com/don9x2E/nojibake`. Change it before upload only if you plan to publish under a different account or organization.
188
+
189
+ ## License
190
+
191
+ MIT
192
+
193
+ ## 한국어
194
+
195
+ Nojibake는 텍스트 처리 전에 파일 바이트를 먼저 점검하는 **읽기 전용 Node.js CLI**입니다. 인코딩 신호, BOM 상태, 줄바꿈 요약, SHA-256, 안전 판정 결과를 안정적인 JSON envelope로 출력합니다.
196
+
197
+ Codex, OpenCode, Claude Code 같은 agent가 파일 내용을 LLM context에 넣거나 파일을 다시 쓰기 전에, CP949/windows-949, UTF-16, binary, mixed EOL 같은 위험 신호를 먼저 확인하는 용도로 만들었습니다.
198
+
199
+ Nojibake는 파일을 수정하지 않습니다. 자동 인코딩 변환기가 아니라, agent와 CI를 위한 read-only preflight guard입니다.
200
+
201
+ ### 설치
202
+
203
+ ```sh
204
+ npm install
205
+ npm run build
206
+ ```
207
+
208
+ 향후 npm 패키지로 공개된 뒤에는 다음처럼 사용할 수 있습니다.
209
+
210
+ ```sh
211
+ npx nojibake version --json
212
+ ```
213
+
214
+ ### 명령어
215
+
216
+ ```sh
217
+ nojibake version --json
218
+ nojibake schema result --json
219
+
220
+ nojibake inspect path --path ./file.txt --json
221
+ nojibake inspect path --root ./safe-root --path ./file.txt --json --compact
222
+ nojibake inspect path --root ./safe-root --path ./file.txt --pretty
223
+
224
+ nojibake scan --root . --json --compact
225
+ nojibake scan --root . --path README.md --path src/cli.ts --json --compact
226
+ git diff --name-only | nojibake scan --root . --stdin-paths --json --compact
227
+ nojibake scan --root . --pretty
228
+
229
+ nojibake guard --root . --fail-on unsafe --json
230
+ git diff --cached --name-only | nojibake guard --root . --stdin-paths --fail-on unsafe,ambiguous,mixed-eol --json --compact
231
+ nojibake guard --root . --fail-on unsafe,ambiguous,mixed-eol,disallowed-encoding --json --compact
232
+ ```
233
+
234
+ `scan`은 지정한 root 아래를 재귀적으로 스캔합니다. `--path`를 하나 이상 넘기거나 `--stdin-paths`를 쓰면 해당 파일 목록만 검사합니다. 기본적으로 `.git`, `node_modules`, `dist`, `coverage`는 건너뜁니다.
235
+
236
+ `--stdin-paths`는 stdin에서 newline-separated path 목록을 읽습니다. Nojibake가 직접 `git`이나 child process를 실행하지 않아도 `git diff --name-only` 결과를 안전하게 연결할 수 있습니다.
237
+
238
+ ### `.nojibakerc.json`
239
+
240
+ 스캔 root에 `.nojibakerc.json`을 두면 프로젝트 기본 정책을 저장할 수 있습니다.
241
+
242
+ ```json
243
+ {
244
+ "maxFiles": 5000,
245
+ "maxBytes": 200000,
246
+ "ignore": ["dist/**", "node_modules/**", "*.png"],
247
+ "failOn": ["unsafe", "ambiguous", "mixed-eol"],
248
+ "allowEncodings": ["utf-8", "ascii", "utf-16le", "windows-949"]
249
+ }
250
+ ```
251
+
252
+ `guard`는 같은 입력을 스캔한 뒤, 요청한 정책에 걸리면 non-zero로 종료합니다.
253
+
254
+ 정책은 다음과 같습니다.
255
+
256
+ - `unsafe`: path safety 실패, 읽기 실패, invalid bytes, binary NUL, 또는 `safeRead: false`
257
+ - `ambiguous`: 여러 non-ASCII 인코딩 후보가 동시에 유효함
258
+ - `mixed-eol`: CRLF/LF/CR 줄바꿈이 섞여 있음
259
+ - `non-utf8`: 감지 인코딩이 `utf-8` 또는 `ascii`가 아님
260
+ - `disallowed-encoding`: 감지 인코딩이 `allowEncodings`에 없음
261
+
262
+ ### Agent token 절약 workflow
263
+
264
+ 파일 내용을 읽기 전에 compact scan으로 먼저 metadata만 확인합니다.
265
+
266
+ ```sh
267
+ nojibake scan --root . --path README.md --path src/cli.ts --json --compact
268
+ ```
269
+
270
+ Compact output은 agent가 token을 쓰기 전에 파일을 필터링하기 쉽게 짧은 key를 사용합니다.
271
+
272
+ - `p`: path
273
+ - `l`: byte length
274
+ - `e`: encoding
275
+ - `d`: decision
276
+ - `sr`: safeRead
277
+ - `sw`: safeRewrite
278
+ - `mix`: mixed line endings
279
+ - `why`: reason codes
280
+ - `err`: error codes
281
+
282
+ ### Agent 통합
283
+
284
+ Nojibake를 설치하는 것만으로 agent의 파일 읽기/쓰기를 자동으로 가로채지는 않습니다. Nojibake는 agent, hook, CI, tool wrapper가 명시적으로 호출해야 하는 CLI guard입니다.
285
+
286
+ 권장 통합 단계는 다음과 같습니다.
287
+
288
+ 1. **Agent instructions**: `AGENTS.md`, `CLAUDE.md` 같은 agent rules 파일에 Nojibake 규칙을 넣습니다.
289
+ 2. **Pre-commit hook**: staged change가 git history에 들어가기 전에 unsafe 파일을 차단합니다.
290
+ 3. **CI guard**: PR/push에서 unsafe 또는 프로젝트 정책에 맞지 않는 파일을 차단합니다.
291
+ 4. **Tool wrapper**: 더 깊게 통합하려면 agent의 `read_file`, `write_file`, `patch` 도구 앞뒤에서 `nojibake inspect` 또는 `nojibake guard`를 호출하도록 감쌉니다.
292
+
293
+ 최소 agent instruction 예시는 다음과 같습니다.
294
+
295
+ ```md
296
+ legacy, Windows, CJK, unknown-encoding 파일을 읽거나 수정하기 전에 실행:
297
+
298
+ nojibake scan --root . --path <path> --json --compact
299
+
300
+ staged change를 commit하기 전에 실행:
301
+
302
+ git diff --cached --name-only | nojibake guard --root . --stdin-paths --fail-on unsafe,ambiguous,mixed-eol,disallowed-encoding --json --compact
303
+
304
+ Nojibake가 `safeRead: false`, `safeRewrite: false`, `ambiguous`, `mixed-eol`, `windows-949`, `utf-16le`를 보고하면 인코딩/줄바꿈 전략이 명확해질 때까지 일반 UTF-8 텍스트 도구로 덮어쓰지 마세요.
305
+ ```
306
+
307
+ Pre-commit hook 예시는 다음과 같습니다.
308
+
309
+ ```sh
310
+ #!/bin/sh
311
+ set -eu
312
+
313
+ git diff --cached --name-only | nojibake guard \
314
+ --root . \
315
+ --stdin-paths \
316
+ --fail-on unsafe,ambiguous,mixed-eol,disallowed-encoding \
317
+ --json --compact
318
+ ```
319
+
320
+ 프로젝트별 정책은 `.nojibakerc.json`에서 `allowEncodings`, `failOn`, `ignore`, `maxFiles`, `maxBytes`를 조정해 관리합니다.
321
+
322
+ ### `inspect path`가 확인하는 것
323
+
324
+ `inspect path`는 읽기 전용입니다. 다음 정보를 보고합니다.
325
+
326
+ - byte length와 SHA-256
327
+ - UTF-8, UTF-16LE, UTF-16BE BOM 감지
328
+ - BOM 기반 텍스트의 strict validation
329
+ - binary NUL 감지
330
+ - BOM 없는 UTF-8 strict validation
331
+ - windows-949 round-trip validation
332
+ - 여러 non-ASCII 후보가 동시에 유효한 ambiguous result
333
+ - CRLF, LF, CR 줄바꿈 카운트
334
+ - `safeRead`, `safeRewrite: false`
335
+
336
+ ### Path safety
337
+
338
+ Nojibake는 missing file, directory, Windows alternate data stream 표기, symlink traversal, optional root 밖으로 벗어나는 파일을 거부합니다.
339
+
340
+ ### 개발
341
+
342
+ ```sh
343
+ npm install
344
+ npm run typecheck
345
+ npm test
346
+ npm run build
347
+ npm run smoke
348
+ npm run smoke:windows
349
+ npm run smoke:installed
350
+ npm run pack:check
351
+ ```
352
+
353
+ CI는 Ubuntu, Windows, macOS에서 Node 20/22 matrix로 실행됩니다. 별도 Windows shell job은 PowerShell/CMD에서 CJK 파일명과 stdin path list 동작을 확인합니다.