mac_ocr 0.1.0-universal-darwin
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +22 -0
- data/LICENSE +21 -0
- data/README.md +126 -0
- data/bin/mac_ocr_helper +0 -0
- data/ext/mac_ocr_helper/build.sh +37 -0
- data/ext/mac_ocr_helper/main.swift +183 -0
- data/lib/mac_ocr/error.rb +18 -0
- data/lib/mac_ocr/helper.rb +25 -0
- data/lib/mac_ocr/ocr.rb +106 -0
- data/lib/mac_ocr/version.rb +3 -0
- data/lib/mac_ocr.rb +7 -0
- data/mac_ocr.gemspec +35 -0
- metadata +58 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 1f8090958f7f18af8c41be1f49104218a321ce656d13c3903bfcdff596260dd2
|
|
4
|
+
data.tar.gz: e2fb9cede71b148295f49fd7e58d1530ee495f375521256a66ca3ba07acc602d
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 8859ca4fd31d805ebb51947940c66bc010509a4b9412cdbe52d50a9a5088710f9d8ac83ef071cc379084ee2ea6f75ef6561f897e90de2e4067634ac259e08ef7
|
|
7
|
+
data.tar.gz: 140136c38f73e19a321ba546c43a55d20ada11eb4bcaeef9a2646b5f26d022a65af2683408c1e07783e64d4be7981a3daab30944a13a82e1cd31919cc4188d36
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [0.1.0] - 2026-05-27
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
- `MacOcr::OCR#recognize` returns `[text, confidence, [x, y, w, h]]` tuples
|
|
14
|
+
using Apple's Vision `VNRecognizeTextRequest`.
|
|
15
|
+
- Bundled Swift CLI helper (`bin/mac_ocr_helper`) shipped as a universal
|
|
16
|
+
arm64 + x86_64 binary targeting macOS 11+.
|
|
17
|
+
- Constructor options: `recognition_level:` (`:accurate` | `:fast`) and
|
|
18
|
+
`language_preference:` (Array of BCP-47 language codes).
|
|
19
|
+
- Structured exception hierarchy: `MacOcr::Error`, `InvalidArgumentError`,
|
|
20
|
+
`HelperExecutionError`, `HelperNotFoundError`.
|
|
21
|
+
- Rake tasks: `helper:build`, `helper:clean`, `release:prepare`, `test`.
|
|
22
|
+
- Minitest suite with a KR+EN fixture image.
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Team Milestone
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
# mac_ocr
|
|
2
|
+
|
|
3
|
+
Ruby OCR on macOS via the Apple Vision framework.
|
|
4
|
+
|
|
5
|
+
`mac_ocr` is a small Ruby gem that wraps a bundled Swift helper which calls
|
|
6
|
+
[`VNRecognizeTextRequest`](https://developer.apple.com/documentation/vision/vnrecognizetextrequest).
|
|
7
|
+
It returns recognized text, per-line confidence, and normalized bounding
|
|
8
|
+
boxes — the same shape exposed by the Python package
|
|
9
|
+
[`ocrmac`](https://github.com/straussmaximilian/ocrmac), so code written
|
|
10
|
+
against ocrmac ports over with minimal changes.
|
|
11
|
+
|
|
12
|
+
- macOS only (the gem targets `universal-darwin`, macOS 11+)
|
|
13
|
+
- Ruby 3.0+
|
|
14
|
+
- No runtime dependencies beyond the bundled helper
|
|
15
|
+
|
|
16
|
+
## Installation
|
|
17
|
+
|
|
18
|
+
From rubygems (once published):
|
|
19
|
+
|
|
20
|
+
```ruby
|
|
21
|
+
# Gemfile
|
|
22
|
+
gem "mac_ocr"
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Directly from GitHub:
|
|
26
|
+
|
|
27
|
+
```ruby
|
|
28
|
+
# Gemfile
|
|
29
|
+
gem "mac_ocr", git: "https://github.com/TeamMilestone/mac_ocr"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
The gem ships with a pre-built universal (arm64 + x86_64) helper binary at
|
|
33
|
+
`bin/mac_ocr_helper`, so no compilation is required at install time. If you
|
|
34
|
+
clone the repo to develop, build it once with `rake helper:build` (needs
|
|
35
|
+
Xcode Command Line Tools for `swiftc` and `lipo`).
|
|
36
|
+
|
|
37
|
+
## Usage
|
|
38
|
+
|
|
39
|
+
```ruby
|
|
40
|
+
require "mac_ocr"
|
|
41
|
+
|
|
42
|
+
ocr = MacOcr::OCR.new(
|
|
43
|
+
"image.png",
|
|
44
|
+
recognition_level: :accurate, # :accurate (default) or :fast
|
|
45
|
+
language_preference: ["ko-KR", "en-US"] # optional, nil for default
|
|
46
|
+
)
|
|
47
|
+
|
|
48
|
+
ocr.recognize
|
|
49
|
+
# => [
|
|
50
|
+
# ["Hello, world!", 1.0, [0.041, 0.575, 0.400, 0.237]],
|
|
51
|
+
# ["안녕, 세상!", 1.0, [0.041, 0.171, 0.321, 0.282]]
|
|
52
|
+
# ]
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Each row is `[text, confidence, [x, y, width, height]]`. Bounding box
|
|
56
|
+
values are in Vision's normalized coordinate space: floats in `[0, 1]`,
|
|
57
|
+
origin at the **bottom-left** of the image. No coordinate conversion is
|
|
58
|
+
performed by the Ruby layer — this matches ocrmac's contract.
|
|
59
|
+
|
|
60
|
+
### Porting from ocrmac
|
|
61
|
+
|
|
62
|
+
The captcha-style helper from Python ports almost verbatim:
|
|
63
|
+
|
|
64
|
+
```ruby
|
|
65
|
+
def run_ocr(image_path)
|
|
66
|
+
items = MacOcr::OCR.new(
|
|
67
|
+
image_path,
|
|
68
|
+
recognition_level: :accurate,
|
|
69
|
+
language_preference: ["ko-KR", "en-US"]
|
|
70
|
+
).recognize
|
|
71
|
+
|
|
72
|
+
# Sort top-to-bottom: larger Vision y is higher on the page.
|
|
73
|
+
items.sort_by { |_text, _conf, bbox| -bbox[1] }.map { |t, _c, _b| t }
|
|
74
|
+
end
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Errors
|
|
78
|
+
|
|
79
|
+
| Exception | Raised when |
|
|
80
|
+
|---|---|
|
|
81
|
+
| `MacOcr::InvalidArgumentError` | Bad constructor args, missing image file, unsupported language code |
|
|
82
|
+
| `MacOcr::HelperExecutionError` | Vision request failed, helper produced unexpected output (carries `code`, `exit_status`, `stderr`) |
|
|
83
|
+
| `MacOcr::HelperNotFoundError` | Bundled binary missing or not executable (e.g. forgot `rake helper:build` in a checkout) |
|
|
84
|
+
|
|
85
|
+
All inherit from `MacOcr::Error`, so `rescue MacOcr::Error` catches them all.
|
|
86
|
+
|
|
87
|
+
## Architecture
|
|
88
|
+
|
|
89
|
+
`MacOcr::OCR` shells out to `bin/mac_ocr_helper`, a Swift CLI that takes
|
|
90
|
+
an image path plus flags and emits a JSON document:
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
mac_ocr_helper <image_path> [--level accurate|fast]
|
|
94
|
+
[--languages ko-KR,en-US]
|
|
95
|
+
[--confidence 0.0]
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
```json
|
|
99
|
+
{ "results": [ { "text": "...", "confidence": 0.5, "bbox": [x, y, w, h] } ] }
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Process spawn overhead is a few milliseconds; Vision itself takes 50–200 ms
|
|
103
|
+
per image, so the shell-out cost is negligible for typical workloads.
|
|
104
|
+
|
|
105
|
+
See `docs/HANDOFF.md` for the design rationale (including why we chose a
|
|
106
|
+
Swift CLI over a Ruby/FFI binding or an Objective-C bridge gem).
|
|
107
|
+
|
|
108
|
+
## Development
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
bundle install
|
|
112
|
+
rake helper:build # rebuild bin/mac_ocr_helper
|
|
113
|
+
rake test # run minitest suite
|
|
114
|
+
rake release:prepare # clean+build+test+gem build
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
## Credits
|
|
118
|
+
|
|
119
|
+
This gem is a Ruby port of the Vision-framework surface of the Python
|
|
120
|
+
package [`ocrmac`](https://github.com/straussmaximilian/ocrmac) by
|
|
121
|
+
Maximilian Strauss. The public API and coordinate conventions are
|
|
122
|
+
deliberately compatible. Many thanks to the ocrmac maintainers.
|
|
123
|
+
|
|
124
|
+
## License
|
|
125
|
+
|
|
126
|
+
MIT — see [LICENSE](LICENSE).
|
data/bin/mac_ocr_helper
ADDED
|
Binary file
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
#
|
|
3
|
+
# Build the mac_ocr_helper as a universal (arm64 + x86_64) binary.
|
|
4
|
+
#
|
|
5
|
+
# Output: bin/mac_ocr_helper
|
|
6
|
+
#
|
|
7
|
+
# Requires: macOS with Xcode Command Line Tools (provides swiftc and lipo).
|
|
8
|
+
#
|
|
9
|
+
set -euo pipefail
|
|
10
|
+
|
|
11
|
+
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
|
12
|
+
SRC="$ROOT/ext/mac_ocr_helper/main.swift"
|
|
13
|
+
BUILD_DIR="$ROOT/ext/mac_ocr_helper/build"
|
|
14
|
+
OUT="$ROOT/bin/mac_ocr_helper"
|
|
15
|
+
|
|
16
|
+
mkdir -p "$BUILD_DIR" "$(dirname "$OUT")"
|
|
17
|
+
|
|
18
|
+
echo "==> building arm64 slice"
|
|
19
|
+
xcrun swiftc -O \
|
|
20
|
+
-target arm64-apple-macos11 \
|
|
21
|
+
-o "$BUILD_DIR/mac_ocr_helper-arm64" \
|
|
22
|
+
"$SRC"
|
|
23
|
+
|
|
24
|
+
echo "==> building x86_64 slice"
|
|
25
|
+
xcrun swiftc -O \
|
|
26
|
+
-target x86_64-apple-macos11 \
|
|
27
|
+
-o "$BUILD_DIR/mac_ocr_helper-x86_64" \
|
|
28
|
+
"$SRC"
|
|
29
|
+
|
|
30
|
+
echo "==> lipo'ing universal binary -> $OUT"
|
|
31
|
+
xcrun lipo -create \
|
|
32
|
+
"$BUILD_DIR/mac_ocr_helper-arm64" \
|
|
33
|
+
"$BUILD_DIR/mac_ocr_helper-x86_64" \
|
|
34
|
+
-output "$OUT"
|
|
35
|
+
|
|
36
|
+
xcrun lipo -info "$OUT"
|
|
37
|
+
echo "==> done. size: $(du -h "$OUT" | cut -f1)"
|
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
import Foundation
|
|
2
|
+
import Vision
|
|
3
|
+
import CoreGraphics
|
|
4
|
+
import ImageIO
|
|
5
|
+
|
|
6
|
+
// MARK: - Exit codes
|
|
7
|
+
|
|
8
|
+
let exitSuccess: Int32 = 0
|
|
9
|
+
let exitVisionError: Int32 = 1
|
|
10
|
+
let exitUsageError: Int32 = 2
|
|
11
|
+
|
|
12
|
+
// MARK: - JSON I/O
|
|
13
|
+
|
|
14
|
+
func emitError(_ message: String, code: String, exitCode: Int32) -> Never {
|
|
15
|
+
let payload: [String: Any] = ["error": message, "code": code]
|
|
16
|
+
if let data = try? JSONSerialization.data(withJSONObject: payload, options: []),
|
|
17
|
+
let line = String(data: data, encoding: .utf8) {
|
|
18
|
+
FileHandle.standardError.write(Data(line.utf8))
|
|
19
|
+
FileHandle.standardError.write(Data("\n".utf8))
|
|
20
|
+
}
|
|
21
|
+
exit(exitCode)
|
|
22
|
+
}
|
|
23
|
+
|
|
24
|
+
func emitResults(_ observations: [VNRecognizedTextObservation], confidenceThreshold: Float) {
|
|
25
|
+
var rows: [[String: Any]] = []
|
|
26
|
+
rows.reserveCapacity(observations.count)
|
|
27
|
+
for obs in observations {
|
|
28
|
+
guard let top = obs.topCandidates(1).first else { continue }
|
|
29
|
+
let conf = top.confidence
|
|
30
|
+
if conf < confidenceThreshold { continue }
|
|
31
|
+
let box = obs.boundingBox
|
|
32
|
+
rows.append([
|
|
33
|
+
"text": top.string,
|
|
34
|
+
"confidence": Double(conf),
|
|
35
|
+
"bbox": [
|
|
36
|
+
Double(box.origin.x),
|
|
37
|
+
Double(box.origin.y),
|
|
38
|
+
Double(box.size.width),
|
|
39
|
+
Double(box.size.height)
|
|
40
|
+
]
|
|
41
|
+
])
|
|
42
|
+
}
|
|
43
|
+
let payload: [String: Any] = ["results": rows]
|
|
44
|
+
guard let data = try? JSONSerialization.data(withJSONObject: payload, options: []) else {
|
|
45
|
+
emitError("failed to serialize results", code: "SERIALIZE_ERROR", exitCode: exitVisionError)
|
|
46
|
+
}
|
|
47
|
+
FileHandle.standardOutput.write(data)
|
|
48
|
+
FileHandle.standardOutput.write(Data("\n".utf8))
|
|
49
|
+
}
|
|
50
|
+
|
|
51
|
+
// MARK: - Argument parsing
|
|
52
|
+
|
|
53
|
+
struct CLIArgs {
|
|
54
|
+
var imagePath: String
|
|
55
|
+
var level: VNRequestTextRecognitionLevel = .accurate
|
|
56
|
+
var languages: [String]? = nil
|
|
57
|
+
var confidence: Float = 0.0
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
func parseArgs(_ argv: [String]) -> CLIArgs {
|
|
61
|
+
var imagePath: String? = nil
|
|
62
|
+
var level: VNRequestTextRecognitionLevel = .accurate
|
|
63
|
+
var languages: [String]? = nil
|
|
64
|
+
var confidence: Float = 0.0
|
|
65
|
+
|
|
66
|
+
var i = 1
|
|
67
|
+
while i < argv.count {
|
|
68
|
+
let arg = argv[i]
|
|
69
|
+
switch arg {
|
|
70
|
+
case "--level":
|
|
71
|
+
guard i + 1 < argv.count else {
|
|
72
|
+
emitError("--level requires a value (accurate|fast)", code: "BAD_ARGS", exitCode: exitUsageError)
|
|
73
|
+
}
|
|
74
|
+
let v = argv[i + 1]
|
|
75
|
+
switch v {
|
|
76
|
+
case "accurate": level = .accurate
|
|
77
|
+
case "fast": level = .fast
|
|
78
|
+
default:
|
|
79
|
+
emitError("invalid --level: \(v) (expected accurate|fast)", code: "BAD_ARGS", exitCode: exitUsageError)
|
|
80
|
+
}
|
|
81
|
+
i += 2
|
|
82
|
+
case "--languages":
|
|
83
|
+
guard i + 1 < argv.count else {
|
|
84
|
+
emitError("--languages requires a comma-separated value", code: "BAD_ARGS", exitCode: exitUsageError)
|
|
85
|
+
}
|
|
86
|
+
let raw = argv[i + 1]
|
|
87
|
+
languages = raw.split(separator: ",").map { String($0).trimmingCharacters(in: .whitespaces) }.filter { !$0.isEmpty }
|
|
88
|
+
i += 2
|
|
89
|
+
case "--confidence":
|
|
90
|
+
guard i + 1 < argv.count, let v = Float(argv[i + 1]) else {
|
|
91
|
+
emitError("--confidence requires a float value", code: "BAD_ARGS", exitCode: exitUsageError)
|
|
92
|
+
}
|
|
93
|
+
confidence = v
|
|
94
|
+
i += 2
|
|
95
|
+
case "-h", "--help":
|
|
96
|
+
let usage = """
|
|
97
|
+
Usage: mac_ocr_helper <image_path> [--level accurate|fast] [--languages ko-KR,en-US] [--confidence 0.0]
|
|
98
|
+
|
|
99
|
+
Calls Apple's Vision VNRecognizeTextRequest on <image_path> and prints
|
|
100
|
+
a JSON document to stdout:
|
|
101
|
+
|
|
102
|
+
{"results": [{"text": "...", "confidence": 0.5, "bbox": [x, y, w, h]}]}
|
|
103
|
+
|
|
104
|
+
Bounding boxes are in Vision's normalized coordinate space (0..1,
|
|
105
|
+
origin at bottom-left).
|
|
106
|
+
|
|
107
|
+
Exit codes:
|
|
108
|
+
0 — success
|
|
109
|
+
1 — Vision request failed (error JSON on stderr)
|
|
110
|
+
2 — usage / file / argument error (error JSON on stderr)
|
|
111
|
+
"""
|
|
112
|
+
FileHandle.standardOutput.write(Data(usage.utf8))
|
|
113
|
+
FileHandle.standardOutput.write(Data("\n".utf8))
|
|
114
|
+
exit(exitSuccess)
|
|
115
|
+
default:
|
|
116
|
+
if arg.hasPrefix("--") {
|
|
117
|
+
emitError("unknown option: \(arg)", code: "BAD_ARGS", exitCode: exitUsageError)
|
|
118
|
+
}
|
|
119
|
+
if imagePath != nil {
|
|
120
|
+
emitError("unexpected positional argument: \(arg)", code: "BAD_ARGS", exitCode: exitUsageError)
|
|
121
|
+
}
|
|
122
|
+
imagePath = arg
|
|
123
|
+
i += 1
|
|
124
|
+
}
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
guard let path = imagePath else {
|
|
128
|
+
emitError("missing required <image_path> argument", code: "BAD_ARGS", exitCode: exitUsageError)
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
return CLIArgs(imagePath: path, level: level, languages: languages, confidence: confidence)
|
|
132
|
+
}
|
|
133
|
+
|
|
134
|
+
// MARK: - Main
|
|
135
|
+
|
|
136
|
+
let args = parseArgs(CommandLine.arguments)
|
|
137
|
+
|
|
138
|
+
let fm = FileManager.default
|
|
139
|
+
var isDir: ObjCBool = false
|
|
140
|
+
guard fm.fileExists(atPath: args.imagePath, isDirectory: &isDir), !isDir.boolValue else {
|
|
141
|
+
emitError("image file not found: \(args.imagePath)", code: "FILE_NOT_FOUND", exitCode: exitUsageError)
|
|
142
|
+
}
|
|
143
|
+
|
|
144
|
+
let url = URL(fileURLWithPath: args.imagePath)
|
|
145
|
+
|
|
146
|
+
// Build request
|
|
147
|
+
let request = VNRecognizeTextRequest()
|
|
148
|
+
request.recognitionLevel = args.level
|
|
149
|
+
request.usesLanguageCorrection = true
|
|
150
|
+
|
|
151
|
+
if let langs = args.languages {
|
|
152
|
+
// Validate against the set of supported languages when the API is available.
|
|
153
|
+
// On macOS 11 the instance method does not exist; we let Vision reject any
|
|
154
|
+
// invalid language at perform time instead.
|
|
155
|
+
if #available(macOS 12.0, *) {
|
|
156
|
+
do {
|
|
157
|
+
let supported = try request.supportedRecognitionLanguages()
|
|
158
|
+
let unsupported = langs.filter { !supported.contains($0) }
|
|
159
|
+
if !unsupported.isEmpty {
|
|
160
|
+
emitError(
|
|
161
|
+
"unsupported language(s): \(unsupported.joined(separator: ",")). supported=\(supported.joined(separator: ","))",
|
|
162
|
+
code: "INVALID_LANGUAGE",
|
|
163
|
+
exitCode: exitUsageError
|
|
164
|
+
)
|
|
165
|
+
}
|
|
166
|
+
} catch {
|
|
167
|
+
emitError("failed to query supported languages: \(error.localizedDescription)", code: "VISION_ERROR", exitCode: exitVisionError)
|
|
168
|
+
}
|
|
169
|
+
}
|
|
170
|
+
request.recognitionLanguages = langs
|
|
171
|
+
}
|
|
172
|
+
|
|
173
|
+
// Perform request
|
|
174
|
+
let handler = VNImageRequestHandler(url: url, options: [:])
|
|
175
|
+
do {
|
|
176
|
+
try handler.perform([request])
|
|
177
|
+
} catch {
|
|
178
|
+
emitError("Vision perform failed: \(error.localizedDescription)", code: "VISION_ERROR", exitCode: exitVisionError)
|
|
179
|
+
}
|
|
180
|
+
|
|
181
|
+
let observations = (request.results ?? [])
|
|
182
|
+
emitResults(observations, confidenceThreshold: args.confidence)
|
|
183
|
+
exit(exitSuccess)
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
module MacOcr
|
|
2
|
+
class Error < StandardError; end
|
|
3
|
+
|
|
4
|
+
class HelperNotFoundError < Error; end
|
|
5
|
+
|
|
6
|
+
class InvalidArgumentError < Error; end
|
|
7
|
+
|
|
8
|
+
class HelperExecutionError < Error
|
|
9
|
+
attr_reader :code, :exit_status, :stderr
|
|
10
|
+
|
|
11
|
+
def initialize(message, code: nil, exit_status: nil, stderr: nil)
|
|
12
|
+
super(message)
|
|
13
|
+
@code = code
|
|
14
|
+
@exit_status = exit_status
|
|
15
|
+
@stderr = stderr
|
|
16
|
+
end
|
|
17
|
+
end
|
|
18
|
+
end
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
require_relative "error"
|
|
2
|
+
|
|
3
|
+
module MacOcr
|
|
4
|
+
module Helper
|
|
5
|
+
BINARY_NAME = "mac_ocr_helper".freeze
|
|
6
|
+
|
|
7
|
+
# Returns the absolute path to the bundled helper binary.
|
|
8
|
+
# Raises HelperNotFoundError if the binary cannot be located.
|
|
9
|
+
def self.binary_path
|
|
10
|
+
candidate = File.expand_path("../../../bin/#{BINARY_NAME}", __FILE__)
|
|
11
|
+
unless File.executable?(candidate)
|
|
12
|
+
raise HelperNotFoundError, <<~MSG.strip
|
|
13
|
+
mac_ocr helper binary not found or not executable at:
|
|
14
|
+
#{candidate}
|
|
15
|
+
|
|
16
|
+
If you are developing from a git checkout, build it with:
|
|
17
|
+
rake helper:build
|
|
18
|
+
|
|
19
|
+
mac_ocr is only supported on macOS.
|
|
20
|
+
MSG
|
|
21
|
+
end
|
|
22
|
+
candidate
|
|
23
|
+
end
|
|
24
|
+
end
|
|
25
|
+
end
|
data/lib/mac_ocr/ocr.rb
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
require "json"
|
|
2
|
+
require "open3"
|
|
3
|
+
|
|
4
|
+
require_relative "error"
|
|
5
|
+
require_relative "helper"
|
|
6
|
+
|
|
7
|
+
module MacOcr
|
|
8
|
+
# Performs text recognition on a single image using the bundled Swift helper
|
|
9
|
+
# that calls Apple's Vision framework (VNRecognizeTextRequest).
|
|
10
|
+
#
|
|
11
|
+
# Example:
|
|
12
|
+
# ocr = MacOcr::OCR.new(
|
|
13
|
+
# "image.png",
|
|
14
|
+
# recognition_level: :accurate,
|
|
15
|
+
# language_preference: ["ko-KR", "en-US"]
|
|
16
|
+
# )
|
|
17
|
+
# ocr.recognize
|
|
18
|
+
# # => [["text", 0.5, [x, y, w, h]], ...]
|
|
19
|
+
#
|
|
20
|
+
# Bounding boxes are in Vision's normalized coordinate space (0..1, origin
|
|
21
|
+
# at bottom-left) — passed through unchanged from the underlying API.
|
|
22
|
+
class OCR
|
|
23
|
+
VALID_LEVELS = %i[accurate fast].freeze
|
|
24
|
+
|
|
25
|
+
attr_reader :image_path, :recognition_level, :language_preference
|
|
26
|
+
|
|
27
|
+
def initialize(image_path, recognition_level: :accurate, language_preference: nil)
|
|
28
|
+
raise InvalidArgumentError, "image_path must be a String, got #{image_path.class}" unless image_path.is_a?(String)
|
|
29
|
+
unless VALID_LEVELS.include?(recognition_level)
|
|
30
|
+
raise InvalidArgumentError, "recognition_level must be one of #{VALID_LEVELS.inspect}, got #{recognition_level.inspect}"
|
|
31
|
+
end
|
|
32
|
+
if language_preference && !language_preference.is_a?(Array)
|
|
33
|
+
raise InvalidArgumentError, "language_preference must be an Array or nil, got #{language_preference.class}"
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
@image_path = image_path
|
|
37
|
+
@recognition_level = recognition_level
|
|
38
|
+
@language_preference = language_preference
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
# Returns an array of [text, confidence, [x, y, w, h]] tuples.
|
|
42
|
+
def recognize
|
|
43
|
+
cmd = build_command
|
|
44
|
+
stdout, stderr, status = Open3.capture3(*cmd)
|
|
45
|
+
|
|
46
|
+
unless status.success?
|
|
47
|
+
raise_from_failure(stderr, status)
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
payload = parse_json(stdout)
|
|
51
|
+
results = payload["results"]
|
|
52
|
+
unless results.is_a?(Array)
|
|
53
|
+
raise HelperExecutionError.new(
|
|
54
|
+
"helper returned malformed payload (missing 'results' array)",
|
|
55
|
+
exit_status: status.exitstatus,
|
|
56
|
+
stderr: stderr
|
|
57
|
+
)
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
results.map { |row| [row["text"], row["confidence"], row["bbox"]] }
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
private
|
|
64
|
+
|
|
65
|
+
def build_command
|
|
66
|
+
cmd = [Helper.binary_path, @image_path, "--level", @recognition_level.to_s]
|
|
67
|
+
if @language_preference && !@language_preference.empty?
|
|
68
|
+
cmd << "--languages" << @language_preference.join(",")
|
|
69
|
+
end
|
|
70
|
+
cmd
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
def parse_json(stdout)
|
|
74
|
+
JSON.parse(stdout)
|
|
75
|
+
rescue JSON::ParserError => e
|
|
76
|
+
raise HelperExecutionError.new(
|
|
77
|
+
"helper produced invalid JSON: #{e.message}",
|
|
78
|
+
stderr: stdout
|
|
79
|
+
)
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
def raise_from_failure(stderr, status)
|
|
83
|
+
code, message = extract_error(stderr)
|
|
84
|
+
case code
|
|
85
|
+
when "BAD_ARGS", "FILE_NOT_FOUND", "INVALID_LANGUAGE"
|
|
86
|
+
raise InvalidArgumentError, message || "helper rejected arguments (exit #{status.exitstatus})"
|
|
87
|
+
else
|
|
88
|
+
raise HelperExecutionError.new(
|
|
89
|
+
message || "helper exited with status #{status.exitstatus}",
|
|
90
|
+
code: code,
|
|
91
|
+
exit_status: status.exitstatus,
|
|
92
|
+
stderr: stderr
|
|
93
|
+
)
|
|
94
|
+
end
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
def extract_error(stderr)
|
|
98
|
+
return [nil, nil] if stderr.nil? || stderr.strip.empty?
|
|
99
|
+
|
|
100
|
+
parsed = JSON.parse(stderr)
|
|
101
|
+
[parsed["code"], parsed["error"]]
|
|
102
|
+
rescue JSON::ParserError
|
|
103
|
+
[nil, stderr.strip]
|
|
104
|
+
end
|
|
105
|
+
end
|
|
106
|
+
end
|
data/lib/mac_ocr.rb
ADDED
data/mac_ocr.gemspec
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
require_relative "lib/mac_ocr/version"
|
|
2
|
+
|
|
3
|
+
Gem::Specification.new do |spec|
|
|
4
|
+
spec.name = "mac_ocr"
|
|
5
|
+
spec.version = MacOcr::VERSION
|
|
6
|
+
spec.authors = ["Team Milestone"]
|
|
7
|
+
spec.email = ["dev@team-milestone.io"]
|
|
8
|
+
|
|
9
|
+
spec.summary = "Ruby OCR on macOS via the Apple Vision framework."
|
|
10
|
+
spec.description = "A Ruby port of the small surface of the Python ocrmac package. " \
|
|
11
|
+
"Wraps a bundled Swift helper that calls VNRecognizeTextRequest, " \
|
|
12
|
+
"returning text, confidence, and normalized bounding boxes."
|
|
13
|
+
spec.homepage = "https://github.com/TeamMilestone/mac_ocr"
|
|
14
|
+
spec.license = "MIT"
|
|
15
|
+
|
|
16
|
+
spec.required_ruby_version = ">= 3.0.0"
|
|
17
|
+
spec.platform = Gem::Platform.new("universal-darwin")
|
|
18
|
+
|
|
19
|
+
spec.metadata["homepage_uri"] = spec.homepage
|
|
20
|
+
spec.metadata["source_code_uri"] = spec.homepage
|
|
21
|
+
spec.metadata["bug_tracker_uri"] = "#{spec.homepage}/issues"
|
|
22
|
+
spec.metadata["changelog_uri"] = "#{spec.homepage}/blob/main/CHANGELOG.md"
|
|
23
|
+
|
|
24
|
+
spec.files = Dir[
|
|
25
|
+
"lib/**/*.rb",
|
|
26
|
+
"ext/mac_ocr_helper/main.swift",
|
|
27
|
+
"ext/mac_ocr_helper/build.sh",
|
|
28
|
+
"bin/mac_ocr_helper",
|
|
29
|
+
"LICENSE",
|
|
30
|
+
"README.md",
|
|
31
|
+
"CHANGELOG.md",
|
|
32
|
+
"mac_ocr.gemspec"
|
|
33
|
+
]
|
|
34
|
+
spec.require_paths = ["lib"]
|
|
35
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: mac_ocr
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: universal-darwin
|
|
6
|
+
authors:
|
|
7
|
+
- Team Milestone
|
|
8
|
+
bindir: bin
|
|
9
|
+
cert_chain: []
|
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
|
+
dependencies: []
|
|
12
|
+
description: A Ruby port of the small surface of the Python ocrmac package. Wraps
|
|
13
|
+
a bundled Swift helper that calls VNRecognizeTextRequest, returning text, confidence,
|
|
14
|
+
and normalized bounding boxes.
|
|
15
|
+
email:
|
|
16
|
+
- dev@team-milestone.io
|
|
17
|
+
executables: []
|
|
18
|
+
extensions: []
|
|
19
|
+
extra_rdoc_files: []
|
|
20
|
+
files:
|
|
21
|
+
- CHANGELOG.md
|
|
22
|
+
- LICENSE
|
|
23
|
+
- README.md
|
|
24
|
+
- bin/mac_ocr_helper
|
|
25
|
+
- ext/mac_ocr_helper/build.sh
|
|
26
|
+
- ext/mac_ocr_helper/main.swift
|
|
27
|
+
- lib/mac_ocr.rb
|
|
28
|
+
- lib/mac_ocr/error.rb
|
|
29
|
+
- lib/mac_ocr/helper.rb
|
|
30
|
+
- lib/mac_ocr/ocr.rb
|
|
31
|
+
- lib/mac_ocr/version.rb
|
|
32
|
+
- mac_ocr.gemspec
|
|
33
|
+
homepage: https://github.com/TeamMilestone/mac_ocr
|
|
34
|
+
licenses:
|
|
35
|
+
- MIT
|
|
36
|
+
metadata:
|
|
37
|
+
homepage_uri: https://github.com/TeamMilestone/mac_ocr
|
|
38
|
+
source_code_uri: https://github.com/TeamMilestone/mac_ocr
|
|
39
|
+
bug_tracker_uri: https://github.com/TeamMilestone/mac_ocr/issues
|
|
40
|
+
changelog_uri: https://github.com/TeamMilestone/mac_ocr/blob/main/CHANGELOG.md
|
|
41
|
+
rdoc_options: []
|
|
42
|
+
require_paths:
|
|
43
|
+
- lib
|
|
44
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
45
|
+
requirements:
|
|
46
|
+
- - ">="
|
|
47
|
+
- !ruby/object:Gem::Version
|
|
48
|
+
version: 3.0.0
|
|
49
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
50
|
+
requirements:
|
|
51
|
+
- - ">="
|
|
52
|
+
- !ruby/object:Gem::Version
|
|
53
|
+
version: '0'
|
|
54
|
+
requirements: []
|
|
55
|
+
rubygems_version: 4.0.6
|
|
56
|
+
specification_version: 4
|
|
57
|
+
summary: Ruby OCR on macOS via the Apple Vision framework.
|
|
58
|
+
test_files: []
|