ngworder 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: b04b0792948a68e7ccff32a55eb8619d38eab72f5b3d2826b37529bbbe72bedf
4
+ data.tar.gz: 1ea0b3c87b1f246cb6bd35f2589934915fa240c9addf7640b1721f62e141d9cd
5
+ SHA512:
6
+ metadata.gz: f6dcba982aa0602c7787ebfedbdcaa8cd33fb7191780de58135f41b587890c53e429c4ab8c37eb503efab23236b48fda9bafb098ccc96d72df689c5825ea6597
7
+ data.tar.gz: 6354451f11ed56d97fe1b1f816026f9b75da01cdc8e7aa9c79343cfb13aae350b086b6332d1c98a7014b6f97aece3627c3d4b902fb8ac1a47d59be897e1ec70e
data/AGENTS.md ADDED
@@ -0,0 +1,39 @@
1
+ # Repository Guidelines
2
+
3
+ ## Project Structure & Module Organization
4
+ - `bin/ngworder`: Ruby CLI entry point that parses rules and scans target files.
5
+ - `NGWORDS.txt`: Sample rules file for local testing and documentation.
6
+ - No test directory yet; add `test/` or `spec/` when automated tests are introduced.
7
+
8
+ ## Rules File Format (NGWORDS.txt)
9
+ - One rule per line: `NG_WORD !EXCLUDE1 !EXCLUDE2`.
10
+ - `#` starts a comment; escape it as `\#` if you need a literal `#`.
11
+ - `!` splits exclusions; escape as `\!` to use a literal exclamation.
12
+ - `/.../` denotes a Ruby regular expression; use `\/` for a literal slash.
13
+ - Matching is substring-based (partial match). Exclusions apply only to the same line.
14
+
15
+ Example:
16
+ ```
17
+ ユーザ !ユーザー # ユーザーは除外
18
+ /アーキテクチャー?/ !/アーキテクチャ/
19
+ ```
20
+
21
+ ## Build, Test, and Development Commands
22
+ - `bin/ngworder target.md`: run the checker against one or more files (defaults to `NGWORDS.txt`).
23
+ - `bin/ngworder --rule=NGWORDS.txt target.md`: run with an explicit rules file.
24
+ - `bin/ngworder --rg target.md`: prefilter literal rules with `rg` (regex rules still scan normally).
25
+ - `ruby -c bin/ngworder`: quick syntax check for the CLI script.
26
+ - `gem build ngworder.gemspec`: build the RubyGems package.
27
+
28
+ ## Coding Style & Naming Conventions
29
+ - Ruby code uses 2-space indentation and snake_case names.
30
+ - Keep CLI output stable: `file:line:col match NG:<rule>`.
31
+ - Prefer ASCII in config and code unless Japanese examples are required.
32
+
33
+ ## Testing Guidelines
34
+ - Uses `minitest` in `test/`. Run with `rake test` or `ruby -Ilib test/test_ngworder.rb`.
35
+ - Cover: basic literal match, regex match, escaped `#`/`!`/`/`, and exclusion overlap.
36
+
37
+ ## Commit & Pull Request Guidelines
38
+ - No established history; use short, imperative commit messages (e.g., "Add rule parser").
39
+ - PRs should include a brief description, example `rules.txt`, and sample CLI output.
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Masanori Kado
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/NGWORDS.txt ADDED
@@ -0,0 +1,7 @@
1
+ # 1行に1ルール: NGワード [!除外語...]
2
+ # /.../ は正規表現
3
+ # コメントは # から末尾まで (\# で文字扱い)
4
+
5
+ ユーザ !ユーザー
6
+ インタフェース
7
+ /アーキテクチャー?/
data/README.md ADDED
@@ -0,0 +1,49 @@
1
+ # ngworder
2
+
3
+ Simple CLI to extract NG words from Japanese text using a plain text rules file.
4
+
5
+ ## Install
6
+ ```
7
+ gem build ngworder.gemspec
8
+ gem install ./ngworder-0.1.0.gem
9
+ ```
10
+
11
+ ## Usage
12
+ ```
13
+ ngworder target.md
14
+ ngworder --rule=NGWORDS.txt target.md
15
+ ngworder --rg target.md
16
+ ngworder --help
17
+ ```
18
+
19
+ ## Test
20
+ ```
21
+ rake test
22
+ ```
23
+
24
+ ## Rules File (NGWORDS.txt)
25
+ - One rule per line: `NG_WORD !EXCLUDE1 !EXCLUDE2`
26
+ - `#` starts a comment; escape as `\#`
27
+ - `!` splits exclusions; escape as `\!`
28
+ - `/.../` denotes a Ruby regex; escape `/` as `\/`
29
+ - Matching is substring-based; exclusions apply only to the same line
30
+
31
+ Example:
32
+ ```
33
+ ユーザ !ユーザー
34
+ インタフェース
35
+ /アーキテクチャー?/
36
+ ```
37
+
38
+ ## Output
39
+ ```
40
+ path/to/file:line:col match NG:<rule>
41
+ ```
42
+
43
+ ## Performance
44
+ - `--rg` prefilters literal rules with ripgrep (optional). Regex rules still scan normally.
45
+ - If `rg` is missing, ngworder falls back to Ruby scanning.
46
+ - Install `rg` (ripgrep): https://github.com/BurntSushi/ripgrep#installation
47
+
48
+ ## License
49
+ MIT
data/bin/ngworder ADDED
@@ -0,0 +1,10 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ begin
5
+ require "ngworder"
6
+ rescue LoadError
7
+ require_relative "../lib/ngworder"
8
+ end
9
+
10
+ exit Ngworder::CLI.run(ARGV)
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Ngworder
4
+ VERSION = "0.1.0"
5
+ end
data/lib/ngworder.rb ADDED
@@ -0,0 +1,334 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "optparse"
4
+ require "tempfile"
5
+ require_relative "ngworder/version"
6
+
7
+ module Ngworder
8
+ Rule = Struct.new(:matcher, :label, :excludes)
9
+ Matcher = Struct.new(:type, :pattern, :label)
10
+
11
+ module Parser
12
+ module_function
13
+
14
+ def strip_comment(line)
15
+ out = +""
16
+ escaped = false
17
+
18
+ line.each_char do |ch|
19
+ if escaped
20
+ out << ch
21
+ escaped = false
22
+ next
23
+ end
24
+
25
+ if ch == '\\'
26
+ escaped = true
27
+ out << ch
28
+ next
29
+ end
30
+
31
+ break if ch == "#"
32
+
33
+ out << ch
34
+ end
35
+
36
+ out
37
+ end
38
+
39
+ def split_unescaped_bang(line)
40
+ parts = []
41
+ current = +""
42
+ escaped = false
43
+
44
+ line.each_char do |ch|
45
+ if escaped
46
+ current << ch
47
+ escaped = false
48
+ next
49
+ end
50
+
51
+ if ch == '\\'
52
+ escaped = true
53
+ current << ch
54
+ next
55
+ end
56
+
57
+ if ch == "!"
58
+ parts << current
59
+ current = +""
60
+ next
61
+ end
62
+
63
+ current << ch
64
+ end
65
+
66
+ parts << current
67
+ parts
68
+ end
69
+
70
+ def unescape_token(str)
71
+ out = +""
72
+ i = 0
73
+
74
+ while i < str.length
75
+ ch = str[i]
76
+
77
+ if ch == '\\' && i + 1 < str.length
78
+ nxt = str[i + 1]
79
+ if ['\\', '/', '#', '!'].include?(nxt)
80
+ out << nxt
81
+ i += 2
82
+ next
83
+ end
84
+ end
85
+
86
+ out << ch
87
+ i += 1
88
+ end
89
+
90
+ out
91
+ end
92
+
93
+ def unescaped_delim?(str, index)
94
+ backslashes = 0
95
+ i = index - 1
96
+ while i >= 0 && str[i] == '\\'
97
+ backslashes += 1
98
+ i -= 1
99
+ end
100
+ backslashes.even?
101
+ end
102
+
103
+ def parse_matcher(raw)
104
+ trimmed = raw.strip
105
+ return nil if trimmed.empty?
106
+
107
+ if trimmed.start_with?("/") && trimmed.length >= 2 && trimmed.end_with?("/")
108
+ return nil unless unescaped_delim?(trimmed, trimmed.length - 1)
109
+
110
+ body = trimmed[1..-2]
111
+ pattern = Regexp.new(unescape_token(body))
112
+ return Matcher.new(:regex, pattern, trimmed)
113
+ end
114
+
115
+ literal = unescape_token(trimmed)
116
+ Matcher.new(:literal, literal, trimmed)
117
+ end
118
+
119
+ def build_rules(path)
120
+ rules = []
121
+
122
+ File.readlines(path, chomp: true).each do |line|
123
+ content = strip_comment(line).strip
124
+ next if content.empty?
125
+
126
+ parts = split_unescaped_bang(content)
127
+ base = parse_matcher(parts.shift || "")
128
+ next unless base
129
+
130
+ excludes = parts.map { |part| parse_matcher(part) }.compact
131
+ rules << Rule.new(base, base.label, excludes)
132
+ end
133
+
134
+ rules
135
+ end
136
+ end
137
+
138
+ module MatcherEngine
139
+ module_function
140
+
141
+ def match_spans(line, matcher)
142
+ spans = []
143
+
144
+ if matcher.type == :regex
145
+ line.scan(matcher.pattern) do
146
+ text = Regexp.last_match(0)
147
+ next if text.nil? || text.empty?
148
+
149
+ start_idx = Regexp.last_match.begin(0)
150
+ end_idx = Regexp.last_match.end(0)
151
+ spans << [start_idx, end_idx, text]
152
+ end
153
+ else
154
+ needle = matcher.pattern
155
+ return spans if needle.empty?
156
+
157
+ start_idx = 0
158
+ while (found = line.index(needle, start_idx))
159
+ spans << [found, found + needle.length, needle]
160
+ start_idx = found + 1
161
+ end
162
+ end
163
+
164
+ spans
165
+ end
166
+
167
+ def excluded?(match_span, exclude_spans)
168
+ match_start, match_end = match_span[0], match_span[1]
169
+
170
+ exclude_spans.any? do |spans|
171
+ spans.any? do |span|
172
+ span_start, span_end = span[0], span[1]
173
+ span_start < match_end && span_end > match_start
174
+ end
175
+ end
176
+ end
177
+ end
178
+
179
+ module RgBackend
180
+ module_function
181
+
182
+ def available?
183
+ system("rg", "--version", out: File::NULL, err: File::NULL)
184
+ end
185
+
186
+ def prefilter_lines(files, rules)
187
+ literals = rules.map { |rule| rule.matcher.pattern }.uniq
188
+ return {} if literals.empty?
189
+
190
+ Tempfile.create("ngworder_rg") do |tmp|
191
+ literals.each { |literal| tmp.puts Regexp.escape(literal) }
192
+ tmp.flush
193
+
194
+ cmd = ["rg", "--line-number", "--with-filename", "--no-heading", "--color=never", "-f", tmp.path, "--"] + files
195
+ results = Hash.new { |hash, key| hash[key] = [] }
196
+
197
+ IO.popen(cmd, "r") do |io|
198
+ io.each_line do |line|
199
+ path, line_no, content = line.chomp.split(":", 3)
200
+ next unless path && line_no
201
+
202
+ results[path] << [line_no.to_i, content || ""]
203
+ end
204
+ end
205
+
206
+ results
207
+ end
208
+ rescue Errno::ENOENT
209
+ nil
210
+ end
211
+ end
212
+
213
+ class CLI
214
+ def self.run(argv)
215
+ options = {}
216
+
217
+ OptionParser.new do |opts|
218
+ opts.banner = "Usage: ngworder [--rule=NGWORDS.txt] <files...>"
219
+ opts.on("--rule=PATH", "Rules file path (default: NGWORDS.txt)") { |value| options[:rule] = value }
220
+ opts.on("--rg", "Use ripgrep for literal-only prefiltering") { options[:rg] = true }
221
+ opts.on("-h", "--help", "Show help") do
222
+ puts opts
223
+ return 0
224
+ end
225
+ end.parse!(argv)
226
+
227
+ rule_path = options[:rule]
228
+ rule_path = "NGWORDS.txt" if rule_path.nil? || rule_path.strip.empty?
229
+
230
+ if argv.empty?
231
+ warn "No input files provided"
232
+ return 2
233
+ end
234
+
235
+ unless File.file?(rule_path)
236
+ warn "Rules file not found: #{rule_path}"
237
+ return 2
238
+ end
239
+
240
+ rules = Parser.build_rules(rule_path)
241
+ warn "No rules loaded from #{rule_path}" if rules.empty?
242
+
243
+ found = false
244
+
245
+ argv.each do |path|
246
+ warn "Skip missing file: #{path}" unless File.file?(path)
247
+ end
248
+
249
+ existing_files = argv.select { |path| File.file?(path) }
250
+
251
+ literal_rules, regex_rules = rules.partition { |rule| rule.matcher.type == :literal }
252
+
253
+ rg_enabled = options[:rg] && RgBackend.available?
254
+ warn "rg not found; falling back to Ruby scan" if options[:rg] && !rg_enabled
255
+
256
+ rg_lines = if rg_enabled && !literal_rules.empty?
257
+ RgBackend.prefilter_lines(existing_files, literal_rules)
258
+ else
259
+ nil
260
+ end
261
+ rg_lines = nil if rg_lines.nil?
262
+
263
+ existing_files.each do |path|
264
+ if rg_lines && regex_rules.empty?
265
+ candidates = rg_lines[path]
266
+ next if candidates.nil? || candidates.empty?
267
+
268
+ candidates.each do |line_no, line|
269
+ literal_rules.each do |rule|
270
+ matches = MatcherEngine.match_spans(line, rule.matcher)
271
+ next if matches.empty?
272
+
273
+ exclude_spans = rule.excludes.map { |ex| MatcherEngine.match_spans(line, ex) }
274
+
275
+ matches.each do |span|
276
+ next if MatcherEngine.excluded?(span, exclude_spans)
277
+
278
+ found = true
279
+ col_no = span[0] + 1
280
+ puts "#{path}:#{line_no}:#{col_no} #{span[2]} NG:#{rule.label}"
281
+ end
282
+ end
283
+ end
284
+ next
285
+ end
286
+
287
+ candidate_lines = rg_lines ? rg_lines[path] : nil
288
+ candidate_set = if candidate_lines
289
+ candidate_lines.each_with_object({}) { |(line_no, _line), acc| acc[line_no] = true }
290
+ else
291
+ nil
292
+ end
293
+
294
+ File.readlines(path, chomp: true).each_with_index do |line, idx|
295
+ line_no = idx + 1
296
+
297
+ regex_rules.each do |rule|
298
+ matches = MatcherEngine.match_spans(line, rule.matcher)
299
+ next if matches.empty?
300
+
301
+ exclude_spans = rule.excludes.map { |ex| MatcherEngine.match_spans(line, ex) }
302
+
303
+ matches.each do |span|
304
+ next if MatcherEngine.excluded?(span, exclude_spans)
305
+
306
+ found = true
307
+ col_no = span[0] + 1
308
+ puts "#{path}:#{line_no}:#{col_no} #{span[2]} NG:#{rule.label}"
309
+ end
310
+ end
311
+
312
+ next if candidate_set && !candidate_set.key?(line_no)
313
+
314
+ literal_rules.each do |rule|
315
+ matches = MatcherEngine.match_spans(line, rule.matcher)
316
+ next if matches.empty?
317
+
318
+ exclude_spans = rule.excludes.map { |ex| MatcherEngine.match_spans(line, ex) }
319
+
320
+ matches.each do |span|
321
+ next if MatcherEngine.excluded?(span, exclude_spans)
322
+
323
+ found = true
324
+ col_no = span[0] + 1
325
+ puts "#{path}:#{line_no}:#{col_no} #{span[2]} NG:#{rule.label}"
326
+ end
327
+ end
328
+ end
329
+ end
330
+
331
+ found ? 1 : 0
332
+ end
333
+ end
334
+ end
metadata ADDED
@@ -0,0 +1,51 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ngworder
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Masanori Kado
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2026-01-11 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: CLI tool to scan files for NG words with per-rule exclusions.
14
+ email:
15
+ - kdmsnr@gmail.com
16
+ executables:
17
+ - ngworder
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - AGENTS.md
22
+ - LICENSE
23
+ - NGWORDS.txt
24
+ - README.md
25
+ - bin/ngworder
26
+ - lib/ngworder.rb
27
+ - lib/ngworder/version.rb
28
+ homepage: https://github.com/kdmsnr/ngworder
29
+ licenses:
30
+ - MIT
31
+ metadata: {}
32
+ post_install_message:
33
+ rdoc_options: []
34
+ require_paths:
35
+ - lib
36
+ required_ruby_version: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '2.7'
41
+ required_rubygems_version: !ruby/object:Gem::Requirement
42
+ requirements:
43
+ - - ">="
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ requirements: []
47
+ rubygems_version: 3.0.3.1
48
+ signing_key:
49
+ specification_version: 4
50
+ summary: Extract NG words from Japanese text using simple rule files
51
+ test_files: []