RubyGems - ngworder - Versions diffs - 0.1.0 - Mend

ngworder 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: b04b0792948a68e7ccff32a55eb8619d38eab72f5b3d2826b37529bbbe72bedf
+  data.tar.gz: 1ea0b3c87b1f246cb6bd35f2589934915fa240c9addf7640b1721f62e141d9cd
+SHA512:
+  metadata.gz: f6dcba982aa0602c7787ebfedbdcaa8cd33fb7191780de58135f41b587890c53e429c4ab8c37eb503efab23236b48fda9bafb098ccc96d72df689c5825ea6597
+  data.tar.gz: 6354451f11ed56d97fe1b1f816026f9b75da01cdc8e7aa9c79343cfb13aae350b086b6332d1c98a7014b6f97aece3627c3d4b902fb8ac1a47d59be897e1ec70e

data/AGENTS.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Repository Guidelines
+## Project Structure & Module Organization
+- `bin/ngworder`: Ruby CLI entry point that parses rules and scans target files.
+- `NGWORDS.txt`: Sample rules file for local testing and documentation.
+- No test directory yet; add `test/` or `spec/` when automated tests are introduced.
+## Rules File Format (NGWORDS.txt)
+- One rule per line: `NG_WORD !EXCLUDE1 !EXCLUDE2`.
+- `#` starts a comment; escape it as `\#` if you need a literal `#`.
+- `!` splits exclusions; escape as `\!` to use a literal exclamation.
+- `/.../` denotes a Ruby regular expression; use `\/` for a literal slash.
+- Matching is substring-based (partial match). Exclusions apply only to the same line.
+Example:
+```
+ユーザ !ユーザー # ユーザーは除外
+/アーキテクチャー?/ !/アーキテクチャ/
+```
+## Build, Test, and Development Commands
+- `bin/ngworder target.md`: run the checker against one or more files (defaults to `NGWORDS.txt`).
+- `bin/ngworder --rule=NGWORDS.txt target.md`: run with an explicit rules file.
+- `bin/ngworder --rg target.md`: prefilter literal rules with `rg` (regex rules still scan normally).
+- `ruby -c bin/ngworder`: quick syntax check for the CLI script.
+- `gem build ngworder.gemspec`: build the RubyGems package.
+## Coding Style & Naming Conventions
+- Ruby code uses 2-space indentation and snake_case names.
+- Keep CLI output stable: `file:line:col  match  NG:<rule>`.
+- Prefer ASCII in config and code unless Japanese examples are required.
+## Testing Guidelines
+- Uses `minitest` in `test/`. Run with `rake test` or `ruby -Ilib test/test_ngworder.rb`.
+- Cover: basic literal match, regex match, escaped `#`/`!`/`/`, and exclusion overlap.
+## Commit & Pull Request Guidelines
+- No established history; use short, imperative commit messages (e.g., "Add rule parser").
+- PRs should include a brief description, example `rules.txt`, and sample CLI output.

data/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Masanori Kado
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/NGWORDS.txt ADDED Viewed

@@ -0,0 +1,7 @@
+# 1行に1ルール: NGワード [!除外語...]
+# /.../ は正規表現
+# コメントは # から末尾まで (\# で文字扱い)
+ユーザ !ユーザー
+インタフェース
+/アーキテクチャー?/

data/README.md ADDED Viewed

@@ -0,0 +1,49 @@
+# ngworder
+Simple CLI to extract NG words from Japanese text using a plain text rules file.
+## Install
+```
+gem build ngworder.gemspec
+gem install ./ngworder-0.1.0.gem
+```
+## Usage
+```
+ngworder target.md
+ngworder --rule=NGWORDS.txt target.md
+ngworder --rg target.md
+ngworder --help
+```
+## Test
+```
+rake test
+```
+## Rules File (NGWORDS.txt)
+- One rule per line: `NG_WORD !EXCLUDE1 !EXCLUDE2`
+- `#` starts a comment; escape as `\#`
+- `!` splits exclusions; escape as `\!`
+- `/.../` denotes a Ruby regex; escape `/` as `\/`
+- Matching is substring-based; exclusions apply only to the same line
+Example:
+```
+ユーザ !ユーザー
+インタフェース
+/アーキテクチャー?/
+```
+## Output
+```
+path/to/file:line:col  match  NG:<rule>
+```
+## Performance
+- `--rg` prefilters literal rules with ripgrep (optional). Regex rules still scan normally.
+- If `rg` is missing, ngworder falls back to Ruby scanning.
+- Install `rg` (ripgrep): https://github.com/BurntSushi/ripgrep#installation
+## License
+MIT

data/bin/ngworder ADDED Viewed

@@ -0,0 +1,10 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+begin
+  require "ngworder"
+rescue LoadError
+  require_relative "../lib/ngworder"
+end
+exit Ngworder::CLI.run(ARGV)

data/lib/ngworder/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module Ngworder
+  VERSION = "0.1.0"
+end

data/lib/ngworder.rb ADDED Viewed

@@ -0,0 +1,334 @@
+# frozen_string_literal: true
+require "optparse"
+require "tempfile"
+require_relative "ngworder/version"
+module Ngworder
+  Rule = Struct.new(:matcher, :label, :excludes)
+  Matcher = Struct.new(:type, :pattern, :label)
+  module Parser
+    module_function
+    def strip_comment(line)
+      out = +""
+      escaped = false
+      line.each_char do |ch|
+        if escaped
+          out << ch
+          escaped = false
+          next
+        end
+        if ch == '\\'
+          escaped = true
+          out << ch
+          next
+        end
+        break if ch == "#"
+        out << ch
+      end
+      out
+    end
+    def split_unescaped_bang(line)
+      parts = []
+      current = +""
+      escaped = false
+      line.each_char do |ch|
+        if escaped
+          current << ch
+          escaped = false
+          next
+        end
+        if ch == '\\'
+          escaped = true
+          current << ch
+          next
+        end
+        if ch == "!"
+          parts << current
+          current = +""
+          next
+        end
+        current << ch
+      end
+      parts << current
+      parts
+    end
+    def unescape_token(str)
+      out = +""
+      i = 0
+      while i < str.length
+        ch = str[i]
+        if ch == '\\' && i + 1 < str.length
+          nxt = str[i + 1]
+          if ['\\', '/', '#', '!'].include?(nxt)
+            out << nxt
+            i += 2
+            next
+          end
+        end
+        out << ch
+        i += 1
+      end
+      out
+    end
+    def unescaped_delim?(str, index)
+      backslashes = 0
+      i = index - 1
+      while i >= 0 && str[i] == '\\'
+        backslashes += 1
+        i -= 1
+      end
+      backslashes.even?
+    end
+    def parse_matcher(raw)
+      trimmed = raw.strip
+      return nil if trimmed.empty?
+      if trimmed.start_with?("/") && trimmed.length >= 2 && trimmed.end_with?("/")
+        return nil unless unescaped_delim?(trimmed, trimmed.length - 1)
+        body = trimmed[1..-2]
+        pattern = Regexp.new(unescape_token(body))
+        return Matcher.new(:regex, pattern, trimmed)
+      end
+      literal = unescape_token(trimmed)
+      Matcher.new(:literal, literal, trimmed)
+    end
+    def build_rules(path)
+      rules = []
+      File.readlines(path, chomp: true).each do |line|
+        content = strip_comment(line).strip
+        next if content.empty?
+        parts = split_unescaped_bang(content)
+        base = parse_matcher(parts.shift || "")
+        next unless base
+        excludes = parts.map { |part| parse_matcher(part) }.compact
+        rules << Rule.new(base, base.label, excludes)
+      end
+      rules
+    end
+  end
+  module MatcherEngine
+    module_function
+    def match_spans(line, matcher)
+      spans = []
+      if matcher.type == :regex
+        line.scan(matcher.pattern) do
+          text = Regexp.last_match(0)
+          next if text.nil? || text.empty?
+          start_idx = Regexp.last_match.begin(0)
+          end_idx = Regexp.last_match.end(0)
+          spans << [start_idx, end_idx, text]
+        end
+      else
+        needle = matcher.pattern
+        return spans if needle.empty?
+        start_idx = 0
+        while (found = line.index(needle, start_idx))
+          spans << [found, found + needle.length, needle]
+          start_idx = found + 1
+        end
+      end
+      spans
+    end
+    def excluded?(match_span, exclude_spans)
+      match_start, match_end = match_span[0], match_span[1]
+      exclude_spans.any? do |spans|
+        spans.any? do |span|
+          span_start, span_end = span[0], span[1]
+          span_start < match_end && span_end > match_start
+        end
+      end
+    end
+  end
+  module RgBackend
+    module_function
+    def available?
+      system("rg", "--version", out: File::NULL, err: File::NULL)
+    end
+    def prefilter_lines(files, rules)
+      literals = rules.map { |rule| rule.matcher.pattern }.uniq
+      return {} if literals.empty?
+      Tempfile.create("ngworder_rg") do |tmp|
+        literals.each { |literal| tmp.puts Regexp.escape(literal) }
+        tmp.flush
+        cmd = ["rg", "--line-number", "--with-filename", "--no-heading", "--color=never", "-f", tmp.path, "--"] + files
+        results = Hash.new { |hash, key| hash[key] = [] }
+        IO.popen(cmd, "r") do |io|
+          io.each_line do |line|
+            path, line_no, content = line.chomp.split(":", 3)
+            next unless path && line_no
+            results[path] << [line_no.to_i, content || ""]
+          end
+        end
+        results
+      end
+    rescue Errno::ENOENT
+      nil
+    end
+  end
+  class CLI
+    def self.run(argv)
+      options = {}
+      OptionParser.new do |opts|
+        opts.banner = "Usage: ngworder [--rule=NGWORDS.txt] <files...>"
+        opts.on("--rule=PATH", "Rules file path (default: NGWORDS.txt)") { |value| options[:rule] = value }
+        opts.on("--rg", "Use ripgrep for literal-only prefiltering") { options[:rg] = true }
+        opts.on("-h", "--help", "Show help") do
+          puts opts
+          return 0
+        end
+      end.parse!(argv)
+      rule_path = options[:rule]
+      rule_path = "NGWORDS.txt" if rule_path.nil? || rule_path.strip.empty?
+      if argv.empty?
+        warn "No input files provided"
+        return 2
+      end
+      unless File.file?(rule_path)
+        warn "Rules file not found: #{rule_path}"
+        return 2
+      end
+      rules = Parser.build_rules(rule_path)
+      warn "No rules loaded from #{rule_path}" if rules.empty?
+      found = false
+      argv.each do |path|
+        warn "Skip missing file: #{path}" unless File.file?(path)
+      end
+      existing_files = argv.select { |path| File.file?(path) }
+      literal_rules, regex_rules = rules.partition { |rule| rule.matcher.type == :literal }
+      rg_enabled = options[:rg] && RgBackend.available?
+      warn "rg not found; falling back to Ruby scan" if options[:rg] && !rg_enabled
+      rg_lines = if rg_enabled && !literal_rules.empty?
+                   RgBackend.prefilter_lines(existing_files, literal_rules)
+                 else
+                   nil
+                 end
+      rg_lines = nil if rg_lines.nil?
+      existing_files.each do |path|
+        if rg_lines && regex_rules.empty?
+          candidates = rg_lines[path]
+          next if candidates.nil? || candidates.empty?
+          candidates.each do |line_no, line|
+            literal_rules.each do |rule|
+              matches = MatcherEngine.match_spans(line, rule.matcher)
+              next if matches.empty?
+              exclude_spans = rule.excludes.map { |ex| MatcherEngine.match_spans(line, ex) }
+              matches.each do |span|
+                next if MatcherEngine.excluded?(span, exclude_spans)
+                found = true
+                col_no = span[0] + 1
+                puts "#{path}:#{line_no}:#{col_no}  #{span[2]}  NG:#{rule.label}"
+              end
+            end
+          end
+          next
+        end
+        candidate_lines = rg_lines ? rg_lines[path] : nil
+        candidate_set = if candidate_lines
+                          candidate_lines.each_with_object({}) { |(line_no, _line), acc| acc[line_no] = true }
+                        else
+                          nil
+                        end
+        File.readlines(path, chomp: true).each_with_index do |line, idx|
+          line_no = idx + 1
+          regex_rules.each do |rule|
+            matches = MatcherEngine.match_spans(line, rule.matcher)
+            next if matches.empty?
+            exclude_spans = rule.excludes.map { |ex| MatcherEngine.match_spans(line, ex) }
+            matches.each do |span|
+              next if MatcherEngine.excluded?(span, exclude_spans)
+              found = true
+              col_no = span[0] + 1
+              puts "#{path}:#{line_no}:#{col_no}  #{span[2]}  NG:#{rule.label}"
+            end
+          end
+          next if candidate_set && !candidate_set.key?(line_no)
+          literal_rules.each do |rule|
+            matches = MatcherEngine.match_spans(line, rule.matcher)
+            next if matches.empty?
+            exclude_spans = rule.excludes.map { |ex| MatcherEngine.match_spans(line, ex) }
+            matches.each do |span|
+              next if MatcherEngine.excluded?(span, exclude_spans)
+              found = true
+              col_no = span[0] + 1
+              puts "#{path}:#{line_no}:#{col_no}  #{span[2]}  NG:#{rule.label}"
+            end
+          end
+        end
+      end
+      found ? 1 : 0
+    end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,51 @@
+--- !ruby/object:Gem::Specification
+name: ngworder
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Masanori Kado
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2026-01-11 00:00:00.000000000 Z
+dependencies: []
+description: CLI tool to scan files for NG words with per-rule exclusions.
+email:
+- kdmsnr@gmail.com
+executables:
+- ngworder
+extensions: []
+extra_rdoc_files: []
+files:
+- AGENTS.md
+- LICENSE
+- NGWORDS.txt
+- README.md
+- bin/ngworder
+- lib/ngworder.rb
+- lib/ngworder/version.rb
+homepage: https://github.com/kdmsnr/ngworder
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '2.7'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.0.3.1
+signing_key:
+specification_version: 4
+summary: Extract NG words from Japanese text using simple rule files
+test_files: []