RubyGems - tans-parser - Versions diffs - 0.1.4 → 0.1.5 - Mend

tans-parser 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 907b90ef203876bd99cc0dbca8eb6b184ed6472b580c6d2918e31469d0f7cc12
-  data.tar.gz: b828065265563752bb2acf5ef380c2ba805f4b4e785fc602337c2b5048267824
+  metadata.gz: 0ea2a71fb0db321851e51510e4250d9c2afcd2830495771ebb3d0adf36e707b0
+  data.tar.gz: 7198b01f46232a42fc80b8d09668b07dc20834c47d16e0a5502d1bb18b7a5d8f
 SHA512:
-  metadata.gz: ae5ea0f42c3663d0edfc35e86bb11064396099f73beebcdcf2fe5d43cfc2b65d810ab2dc055cf5d186a3571aa11703abc006962407a30b979c1a7d4d6e46f3f7
-  data.tar.gz: b3379404c4cee09f3e49ceeaaa0e3ec90322a665fbb671a7d5e19506759b72e338fce4e782cc2017779d7812eadfddc098df82ba6bd0cb4ecec50a283d4ce044
+  metadata.gz: 165b2cd6b7cc5c0edb4bf4dd510fca304bd48fdcd0bbc7757d9edaf9ef4ef9fd76e3bfc39708301eb16ad263df2a3ec7f2b450ccca9a9b9ed9c24aca64f68f0b
+  data.tar.gz: 8eaede4178263a217f715d9ee79e263aff0169801bb7f9eecc6b82a81608e32f8c442788e17afff67e3a6bd68165de6612a5ca0a80d661b3711fc092b4871561

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,34 @@
 # CHANGELOG
+## 0.1.5
+- **Confidence scoring** — each detected element now carries a `confidence` value (0.0–1.0):
+  - `Element#confident?` — returns `true` when confidence ≥ 0.5 (or nil, for backward compatibility)
+  - Scoring heuristics per role:
+    - **Buttons**: 0.9 (square), 0.85 (round), 0.75 (angle); −0.2 penalty for single-character text
+    - **Checkboxes**: 0.9 (checked), 0.85 (unchecked)
+    - **Dialogs**: 0.9 base; +0.05 bonus for titled borders (text in top border)
+    - **Statusbars**: 0.9 (inverse colors), 0.85 (separator-preceded footer), 0.5 (≥30-char fallback)
+    - **Progress bars**: 0.9 (incomplete), 0.95 (100% complete)
+    - **Inputs**: 0.9
+    - **Labels**: 0.8 (single-word), 0.85 (multi-word)
+    - **Menus**: 0.9 (3+ items), 0.85 (2 items), 0.8 (dropdown `> Item`)
+    - **Tabs**: 0.85 (3+ tabs), 0.7 (2 tabs); +0.05 for focused tab
+    - **Annotations**: 1.0 (manually annotated); can be overridden via `confidence:` keyword
+  - `confidence` included in `Element#to_h` when set; excluded when nil (backward compatible)
+- **Reduced false positives** — tighter heuristics to avoid misdetection:
+  - **Buttons**: skip numeric-only brackets (e.g. `[12]`, `[3]`)
+  - **Labels**: skip URL schemes (`https://example.com`) and time patterns (`Meeting at 3:00`)
+  - **Progress bars**: minimum width of 6 characters (`[##]` is no longer detected)
+- **Negative tests** — 15 new tests covering edge cases (numeric brackets, URLs, time patterns, short progress bars, tabs across rows, incomplete boxes, prompt-like menus, etc.)
+- **Confidence tests** — 15 new tests verifying scoring for every element type and edge case (titled dialogs, focused tabs, complete progress bars, annotation override, etc.)
+- **Benchmarks** — `benchmark-ips` suite for parser, diff, and selector:
+  - `benchmarks/parser_benchmark.rb` — plain, ANSI, cursor, complex, and dialog-like workloads
+  - `benchmarks/diff_benchmark.rb` — full, chars_only, and ignore_rows modes
+  - `benchmarks/selector_benchmark.rb` — full scan with mixed UI elements
+  - `benchmark-ips` added as development dependency
+- 30 new tests, 374 total, 100% line and branch coverage maintained
 ## 0.1.4
 - **Unicode width support** — correct display width for CJK, emoji, and combining characters:

data/README.md CHANGED Viewed

@@ -152,7 +152,7 @@ selector.statusbars  # => includes annotated statusbar
 # Annotations accept extra attributes
 state.annotate_role(:button, row: 0, col: 0, width: 6, height: 1,
-                    text: "Submit", fg: "green", disabled: false)
+                    text: "Submit", fg: "green", disabled: false, confidence: 0.8)
 ```
 ### State comparison (diff)
@@ -197,13 +197,15 @@ el.height    # => 1
 el.checked   # => true/false/nil
 el.focused   # => true/false/nil
 el.disabled  # => true/false/nil
+el.confidence # => 0.9 (Float 0.0-1.0) or nil when not set
 el.fg        # => "default"
 el.bg        # => "default"
-el.to_h      # => {role: :button, text: "OK", row: 1, col: 2, ...}
+el.to_h      # => {role: :button, text: "OK", row: 1, col: 2, confidence: 0.9, ...}
 # Predicates
 el.checked?   # => false (always boolean)
 el.disabled?  # => false (always boolean)
+el.confident? # => true when confidence >= 0.5 (or nil)
 # Geometry
 el.bounds     # => {row: 1, col: 2, width: 4, height: 1}
@@ -214,6 +216,49 @@ el.type("hello")    # => {action: :type, target: el, row: 1, col: 4, text: "hell
 el.press_key(:tab)  # => {action: :press_key, target: el, key: :tab}
 ```
+### Confidence scoring
+Each detected element carries a `confidence` value (0.0–1.0) reflecting how sure the heuristics are:
+```ruby
+btn = selector.button
+btn.confidence  # => 0.9 (square-bracket buttons are high confidence)
+btn.confident?  # => true
+# Low-confidence detections can be filtered out
+reliable = selector.buttons.select(&:confident?)  # confidence >= 0.5
+```
+Confidence values per role and context:
+| Role | Scenario | Confidence |
+|------|----------|------------|
+| `:button` | `[ OK ]` square brackets | 0.9 |
+| `:button` | `(Cancel)` round brackets | 0.85 |
+| `:button` | `<Submit>` angle brackets | 0.75 |
+| `:button` | Single-character text | −0.2 penalty |
+| `:checkbox` | `[x]` checked | 0.9 |
+| `:checkbox` | `[ ]` unchecked | 0.85 |
+| `:input` | `[________]` underscore brackets | 0.9 |
+| `:label` | `Project Name:` (multi-word) | 0.85 |
+| `:label` | `Username:` (single-word) | 0.8 |
+| `:menu` | 3+ items on menu bar | 0.9 |
+| `:menu` | 2 items on menu bar | 0.85 |
+| `:menu` | `> Item` dropdown | 0.8 |
+| `:tab` | 3+ tabs | 0.85 |
+| `:tab` | 2 tabs | 0.7 |
+| `:tab` | Focused tab (underline/bg) | +0.05 bonus |
+| `:dialog` | Complete box with all 4 corners | 0.9 |
+| `:dialog` | Titled border (text in top border) | 0.95 |
+| `:statusbar` | Inverse colors + ≥3 colored cells | 0.9 |
+| `:statusbar` | Separator-preceded footer | 0.85 |
+| `:statusbar` | Fallback (≥30 chars, no bg info) | 0.5 |
+| `:progress` | `[#####     ]` incomplete | 0.9 |
+| `:progress` | `[##########]` 100% complete | 0.95 |
+| Annotation | Manually annotated via `annotate_role` | 1.0 |
+`confidence` is excluded from `to_h` when nil (backward compatible).
 ### Recognized element patterns
 | Role | Pattern | Example |
@@ -241,8 +286,9 @@ Each cell is a Hash with these keys:
 | `italic` | Boolean | Italic style |
 | `underline` | Boolean | Underline style |
 | `blink` | Boolean | Blink style |
+| `width` | Integer | Display width (1 for normal, 2 for CJK/emoji, 0 for continuation) |
-Default cell: `{char: " ", fg: "default", bg: "default", bold: false, italic: false, underline: false, blink: false}`
+Default cell: `{char: " ", fg: "default", bg: "default", bold: false, italic: false, underline: false, blink: false, width: 1}`
 ## Supported ANSI sequences
@@ -255,7 +301,8 @@ Default cell: `{char: " ", fg: "default", bg: "default", bold: false, italic: fa
 - **Cursor style** — DECSCUSR
 - **Mouse tracking** — DEC private modes 1000, 1002, 1003, 1006
 - **ISO 2022** — G0/G1 charset switching, DEC Special Graphics
-- **UTF-8** — Multi-byte characters including emoji
+- **UTF-8** — Multi-byte characters including CJK, emoji (correct display width via `unicode-display_width`)
+- **Combining characters** — Zero-width combining marks appended to previous cell
 ## License

data/lib/tans_parser/element.rb CHANGED Viewed

@@ -11,6 +11,7 @@ module TansParser
     :focused,
     :fg, :bg,
     :disabled,
+    :confidence,
     keyword_init: true,
   ) do
     def checked?
@@ -37,6 +38,10 @@ module TansParser
       { action: :press_key, target: self, key: key }
     end
+    def confident?
+      confidence.nil? || confidence >= 0.5
+    end
     def to_h
       {
         role: role,
@@ -47,6 +52,7 @@ module TansParser
         focused: focused,
         fg: fg, bg: bg,
         disabled: disabled,
+        confidence: confidence,
       }.compact
     end
   end

data/lib/tans_parser/selector.rb CHANGED Viewed

@@ -147,7 +147,7 @@ module TansParser
     # Detects annotations: manually annotated roles from State#annotate_role
     def detect_annotations
-      @state.annotations.map { |a| Element.new(a) }
+      @state.annotations.map { |a| Element.new(**a, confidence: a[:confidence] || 1.0) }
     end
     # Detects buttons: [ OK ], (Cancel), <Submit>
@@ -163,8 +163,18 @@ module TansParser
           next if text.empty?
           next if text.match?(/^_+$/)
           next if text.match?(/^[ xX*]$/) # skip checkbox markers
+          next if text.match?(/^\d+$/)    # skip numeric-only brackets (e.g. [12])
           col = ::Regexp.last_match.begin(0)
+          confidence = if ::Regexp.last_match[1]
+                         0.9
+                       elsif ::Regexp.last_match[2]
+                         0.85
+                       else
+                         0.75
+                       end
+          confidence -= 0.2 if text.length == 1 # penalize single-character buttons
           buttons << Element.new(
             role: :button,
             text: text,
@@ -172,6 +182,7 @@ module TansParser
             width: ::Regexp.last_match[0].length, height: 1,
             fg: row[col][:fg],
             bg: row[col][:bg],
+            confidence: confidence,
           )
         end
       end
@@ -190,12 +201,14 @@ module TansParser
         checked = match[2] != " "
         label_text = match[3].strip
         col = match.begin(3)
+        confidence = checked ? 0.9 : 0.85
         checkboxes << Element.new(
           role: :checkbox,
           text: label_text,
           row: r, col: col,
           width: label_text.length, height: 1,
           checked: checked,
+          confidence: confidence,
         )
       end
       checkboxes
@@ -218,11 +231,16 @@ module TansParser
           if bottom_r
             height = bottom_r - r + 1
             text = extract_dialog_text(r + 1, tl_idx + 1, width - 2, height - 2)
+            confidence = 0.9
+            # Bonus for titled borders (text in top border)
+            top_border = line[tl_idx..(tl_idx + width - 1)]
+            confidence = (confidence + 0.05).round(2) if top_border.match?(/[A-Za-z]/)
             dialogs << Element.new(
               role: :dialog,
               text: text,
               row: r, col: tl_idx,
               width: width, height: height,
+              confidence: confidence,
             )
           end
           tl_idx += 1
@@ -291,6 +309,7 @@ module TansParser
           row: row_idx, col: 0,
           width: row.length, height: 1,
           bg: non_default.first[:bg],
+          confidence: 0.9,
         )
         return bars
       end
@@ -303,6 +322,7 @@ module TansParser
           role: :statusbar, text: text,
           row: grid.length - 1, col: 0,
           width: last_row.length, height: 1,
+          confidence: 0.5,
         )
         return bars
       end
@@ -326,6 +346,7 @@ module TansParser
           role: :statusbar, text: text,
           row: r, col: 0,
           width: row.length, height: 1,
+          confidence: 0.85,
         )
         return bars
       end
@@ -341,16 +362,19 @@ module TansParser
         line = row.map { |c| c[:char] }.join
         match = line.match(/\[([#>=-]+)\s*\]/)
         next unless match
+        next if match[0].length < 6 # skip too-short brackets (e.g. [##])
         filled = match[1]
         total = match[0].length - 2
         percent = (filled.length.to_f / total * 100).round
+        confidence = percent == 100 ? 0.95 : 0.9
         bars << Element.new(
           role: :progress,
           text: "#{percent}%",
           row: r, col: ::Regexp.last_match.begin(0),
           width: match[0].length, height: 1,
           checked: percent == 100,
+          confidence: confidence,
         )
       end
       bars
@@ -369,6 +393,7 @@ module TansParser
             text: "",
             row: r, col: col,
             width: m[0].length, height: 1,
+            confidence: 0.9,
           )
         end
       end
@@ -385,13 +410,17 @@ module TansParser
         label_text = match[1].strip.sub(/:$/, "").strip
         next if label_text.empty? || label_text.length < 2
+        next if match[1].match?(/\d:/)            # skip patterns ending with digit before colon (e.g. "Meeting at 3:")
+        next if line[match.end(1), 2] == "//"     # skip URL schemes (e.g. "https://example.com")
         col = match.begin(1)
+        confidence = label_text.include?(" ") ? 0.85 : 0.8 # multi-word labels are stronger signals
         labels << Element.new(
           role: :label,
           text: label_text,
           row: r, col: col,
           width: match[1].length, height: 1,
+          confidence: confidence,
         )
       end
       labels
@@ -411,11 +440,13 @@ module TansParser
           items = stripped.split(/\s{2,}/)
           if items.length >= 2 && items.all? { |i| i.match?(/^[A-Za-z]/) }
             col = line.index(stripped)
+            confidence = items.length >= 3 ? 0.9 : 0.85
             menus << Element.new(
               role: :menu,
               text: items.join(" | "),
               row: r, col: col || 0,
               width: line.length, height: 1,
+              confidence: confidence,
             )
           end
         end
@@ -428,6 +459,7 @@ module TansParser
             text: m[0].sub(/^>\s*/, "").strip,
             row: r, col: m.begin(0),
             width: m[0].length, height: 1,
+            confidence: 0.8,
           )
         end
       end
@@ -453,12 +485,15 @@ module TansParser
           cell = row[m.begin(0)]
           focused = cell[:underline] || cell[:bg] != "default"
+          base_confidence = matches.length >= 3 ? 0.85 : 0.7
+          confidence = focused ? [base_confidence + 0.05, 0.9].min.round(2) : base_confidence
           tabs << Element.new(
             role: :tab,
             text: tab_text,
             row: r, col: m.begin(0),
             width: m[0].length, height: 1,
             focused: focused,
+            confidence: confidence,
           )
         end
       end

data/lib/tans_parser/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module TansParser
-  VERSION = "0.1.4"
+  VERSION = "0.1.5"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: tans-parser
 version: !ruby/object:Gem::Version
-  version: 0.1.4
+  version: 0.1.5
 platform: ruby
 authors:
 - Haluk Durmus
@@ -121,6 +121,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '0.22'
+- !ruby/object:Gem::Dependency
+  name: benchmark-ips
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.13'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.13'
 - !ruby/object:Gem::Dependency
   name: unicode-display_width
   requirement: !ruby/object:Gem::Requirement