activestorage-ocr 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c2944589a15452bdc9189df16529baf298e7db1d39898c6a127558dd8e55e58a
4
- data.tar.gz: bc222d478d3c122f1bdba2e0f7a8b58893a9c0581ec86d06c2c8025410d36a20
3
+ metadata.gz: 77f12aaeaf8f89365fc3e7789d375af8ac5fdc8920c4dd6ecff838e8268a6264
4
+ data.tar.gz: 5d9a43201af05e03469bf61d7b6bd9b77aec2ffb489073a9c768dba0638ad2fb
5
5
  SHA512:
6
- metadata.gz: b6569b0027e7976cb5afa6cf45294eb1680e7f6f769be27db98794f64cf5079f6254587f56b70adcb72a033b566bb6836879b7563fab18470b15ec7434b215b5
7
- data.tar.gz: 45839fed9f4f618c50246c89f7acc3345a3c57d52a07c348fed17420bbb465ecf883f6d1b711f1b33ea57a35975995cdbe8ab26c1486ade6267778a34db8cf59
6
+ metadata.gz: 9e4858ee78b144076beade1a20a7458ad478149937993a8cd1a1e82c9af13cbced8c6c4686fc2bdf2d722747830572566166a0dd1afd74c535021b97407f2d88
7
+ data.tar.gz: 24ea227e389f377a192f5efd533f410fc1cd09b7adf3e322b555334ac770f36cf4d94fb5fffae602b66560991a39c41f79ca18ed9e271657b28dbc3181f5e3f6
data/README.md CHANGED
@@ -3,17 +3,19 @@
3
3
  [![CI](https://github.com/Cause-of-a-Kind/activestorage-ocr/actions/workflows/ci.yml/badge.svg)](https://github.com/Cause-of-a-Kind/activestorage-ocr/actions/workflows/ci.yml)
4
4
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
5
5
 
6
- OCR for Rails Active Storage attachments, powered by Rust and [ocrs](https://github.com/robertknight/ocrs).
6
+ OCR for Rails Active Storage attachments, powered by Rust.
7
7
 
8
8
  ## Overview
9
9
 
10
- `activestorage-ocr` provides optical character recognition (OCR) for files stored with Active Storage. It uses a high-performance Rust server with the pure-Rust `ocrs` OCR engine, eliminating the need for third-party OCR services or system-level dependencies.
10
+ `activestorage-ocr` provides optical character recognition (OCR) for files stored with Active Storage. It uses a high-performance Rust server with your choice of OCR engine, eliminating the need for third-party OCR services.
11
11
 
12
12
  **Key Features:**
13
- - **Pure Rust** - No Tesseract or system dependencies required
14
- - **Self-contained** - Models download automatically on first run (~50MB)
15
- - **Fast** - Processes images in ~150ms
16
- - **HTTP/JSON API** - Easy to debug and integrate
13
+ - **Two OCR Engines** - Choose the right tool for the job:
14
+ - **ocrs** (default) - Pure Rust, zero system dependencies
15
+ - **leptess** - Tesseract-based, statically linked (no system Tesseract needed)
16
+ - **Self-contained** - Pre-built binaries with no system dependencies
17
+ - **Per-request engine selection** - Use different engines for different files
18
+ - **Automatic** - OCR runs automatically when files are uploaded via Active Storage
17
19
 
18
20
  **Supported Formats:**
19
21
  - Images: PNG, JPEG, TIFF, WebP, GIF, BMP
@@ -24,6 +26,24 @@ OCR for Rails Active Storage attachments, powered by Rust and [ocrs](https://git
24
26
  - **Ruby gem** provides seamless Rails integration
25
27
  - Simple HTTP/JSON protocol for easy debugging
26
28
 
29
+ ### Choosing an Engine
30
+
31
+ | Engine | Binary Size | Notes |
32
+ |--------|-------------|-------|
33
+ | `ocrs` | ~9 MB | Pure Rust, modern neural network approach |
34
+ | `leptess` | ~13 MB | Tesseract-based, well-established OCR engine |
35
+ | `all` | ~15 MB | Includes both engines for comparison |
36
+
37
+ **Which engine should I use?**
38
+
39
+ Start with `ocrs` (the default). It works well for most documents and has a smaller binary. If results aren't satisfactory, try `leptess`—it uses Tesseract, a mature OCR engine that's been around for decades.
40
+
41
+ Neither engine is universally better—performance varies by image. The `all` variant lets you compare both engines on your actual documents to see which works best for your use case.
42
+
43
+ **Platform Support:**
44
+ - **Linux (x86_64):** Fully supported
45
+ - **macOS / Windows:** ocrs-only variant works; Tesseract variants require building from source
46
+
27
47
  ## Requirements
28
48
 
29
49
  - Ruby 3.2+
@@ -32,53 +52,116 @@ OCR for Rails Active Storage attachments, powered by Rust and [ocrs](https://git
32
52
 
33
53
  ## Installation
34
54
 
35
- Add to your Gemfile:
55
+ **1. Add to your Gemfile:**
36
56
 
37
57
  ```ruby
38
58
  gem "activestorage-ocr"
39
59
  ```
40
60
 
41
- Then install the OCR server binary:
61
+ **2. Install the gem and OCR server binary:**
42
62
 
43
63
  ```bash
44
64
  bundle install
45
65
  bin/rails activestorage_ocr:install
46
66
  ```
47
67
 
48
- ## Quick Start
68
+ By default, this installs the `ocrs` engine (~9 MB binary). To install with Tesseract support:
69
+
70
+ ```bash
71
+ # Install both engines (recommended for comparing results)
72
+ bin/rails activestorage_ocr:install variant=all
73
+
74
+ # Or install only the Tesseract-based engine
75
+ bin/rails activestorage_ocr:install variant=leptess
76
+ ```
77
+
78
+ **No system dependencies required.** The Tesseract OCR engine and training data are statically linked into the binary.
79
+
80
+ **3. Add the OCR server to your `Procfile.dev`:**
81
+
82
+ ```procfile
83
+ web: bin/rails server
84
+ ocr: activestorage-ocr-server --host 127.0.0.1 --port 9292
85
+ ```
86
+
87
+ Now when you run `bin/dev`, the OCR server starts automatically alongside Rails.
88
+
89
+ > **Note:** If you don't have a `Procfile.dev`, create one. Rails 7+ apps typically use `bin/dev` with [foreman](https://github.com/ddollar/foreman) or [overmind](https://github.com/DarthSim/overmind) to manage multiple processes.
90
+
91
+ ## Usage
92
+
93
+ Once installed, OCR happens **automatically** when you upload images or PDFs through Active Storage. The extracted text is stored in the blob's metadata.
94
+
95
+ ### Accessing OCR Results
96
+
97
+ ```ruby
98
+ # After uploading a file
99
+ document.file.analyze # Triggers OCR (usually happens automatically)
100
+
101
+ # Access the results from metadata
102
+ document.file.metadata["ocr_text"] # => "Extracted text..."
103
+ document.file.metadata["ocr_confidence"] # => 0.85
104
+ document.file.metadata["ocr_processed_at"] # => "2024-12-24T12:00:00Z"
105
+ ```
106
+
107
+ ### Helper Methods (Optional)
108
+
109
+ Add convenience methods to your model:
110
+
111
+ ```ruby
112
+ class Document < ApplicationRecord
113
+ has_one_attached :file
114
+
115
+ def ocr_text
116
+ file.metadata["ocr_text"]
117
+ end
118
+
119
+ def ocr_confidence
120
+ file.metadata["ocr_confidence"]
121
+ end
122
+
123
+ def ocr_processed?
124
+ file.metadata["ocr_processed_at"].present?
125
+ end
126
+ end
127
+ ```
128
+
129
+ ### Using the Client Directly
49
130
 
50
- 1. **Start the OCR server:**
131
+ You can also use the client directly for more control:
51
132
 
52
- ```bash
53
- bin/rails activestorage_ocr:start
54
- ```
133
+ ```ruby
134
+ client = ActiveStorage::Ocr::Client.new
55
135
 
56
- 2. **Use the client in your Rails app:**
136
+ # Check server health
137
+ client.healthy? # => true
57
138
 
58
- ```ruby
59
- # In rails console or your code
60
- client = ActiveStorage::Ocr::Client.new
139
+ # Extract text from a file path
140
+ result = client.extract_text_from_path("/path/to/image.png")
141
+ result.text # => "Extracted text..."
142
+ result.confidence # => 0.95
61
143
 
62
- # Check server health
63
- client.healthy? # => true
144
+ # Extract text from an Active Storage attachment
145
+ result = client.extract_text(document.file)
64
146
 
65
- # Extract text from a file
66
- result = client.extract_text_from_path("/path/to/image.png", content_type: "image/png")
67
- result.text # => "Extracted text..."
68
- result.confidence # => 0.95
147
+ # Per-request engine selection (requires all-engines variant)
148
+ result = client.extract_text(document.file, engine: :ocrs) # Pure Rust engine
149
+ result = client.extract_text(document.file, engine: :leptess) # Tesseract engine
69
150
 
70
- # Extract text from an Active Storage attachment
71
- result = client.extract_text(document.file)
72
- ```
151
+ # Compare engines side-by-side
152
+ comparison = client.compare(document.file)
153
+ comparison[:ocrs].confidence # => 0.95
154
+ comparison[:leptess].confidence # => 0.92
155
+ ```
73
156
 
74
157
  ## Configuration
75
158
 
76
159
  ```ruby
77
160
  # config/initializers/activestorage_ocr.rb
78
161
  ActiveStorage::Ocr.configure do |config|
79
- config.server_host = "127.0.0.1"
80
- config.server_port = 9292
81
- config.timeout = 60
162
+ config.server_url = ENV.fetch("ACTIVESTORAGE_OCR_SERVER_URL", "http://127.0.0.1:9292")
163
+ config.timeout = 60 # Request timeout in seconds
164
+ config.open_timeout = 10 # Connection timeout in seconds
82
165
  end
83
166
  ```
84
167
 
@@ -86,8 +169,133 @@ end
86
169
 
87
170
  | Variable | Default | Description |
88
171
  |----------|---------|-------------|
89
- | `ACTIVESTORAGE_OCR_HOST` | `127.0.0.1` | Server host |
90
- | `ACTIVESTORAGE_OCR_PORT` | `9292` | Server port |
172
+ | `ACTIVESTORAGE_OCR_SERVER_URL` | `http://127.0.0.1:9292` | Full URL to the OCR server |
173
+
174
+ ## Production Deployment
175
+
176
+ For production deployments, the OCR server binary needs to be installed in your app's `bin/` directory (not the gem directory) so it can be referenced from your Procfile.
177
+
178
+ ### Setup for Production
179
+
180
+ **1. Run the install generator:**
181
+
182
+ ```bash
183
+ rails generate activestorage_ocr:install
184
+ ```
185
+
186
+ This creates:
187
+ - `bin/activestorage-ocr-server` - A wrapper script that runs the OCR server
188
+ - `bin/dist/activestorage-ocr-server` - The actual binary (gitignored)
189
+
190
+ **2. Add to your Procfile:**
191
+
192
+ ```procfile
193
+ web: bundle exec puma -C config/puma.rb
194
+ ocr: bin/activestorage-ocr-server --host 127.0.0.1 --port 9292
195
+ ```
196
+
197
+ ### Docker Deployment
198
+
199
+ In your `Dockerfile`, run the generator during the build to install the binary:
200
+
201
+ ```dockerfile
202
+ # Install gems
203
+ RUN bundle install
204
+
205
+ # Install OCR server binary to bin/dist/
206
+ # Use variant=all to include both ocrs and Tesseract engines (~15 MB)
207
+ RUN bundle exec rails activestorage_ocr:install variant=all path=./bin/dist
208
+
209
+ # Or for the smaller ocrs-only variant (~9 MB):
210
+ # RUN bundle exec rails activestorage_ocr:install path=./bin/dist
211
+ ```
212
+
213
+ Use foreman to manage both processes:
214
+
215
+ ```dockerfile
216
+ # Procfile
217
+ CMD ["bundle", "exec", "foreman", "start"]
218
+ ```
219
+
220
+ ### Fly.io Deployment
221
+
222
+ **fly.toml configuration:**
223
+
224
+ ```toml
225
+ app = "your-app-name"
226
+ primary_region = "iad"
227
+
228
+ [deploy]
229
+ # Note: Don't use release_command for SQLite with volumes
230
+ # Migrations run in docker-entrypoint instead
231
+
232
+ [env]
233
+ RAILS_ENV = "production"
234
+ ACTIVESTORAGE_OCR_SERVER_URL = "http://127.0.0.1:9292"
235
+
236
+ [http_service]
237
+ internal_port = 8080
238
+ force_https = true
239
+
240
+ [[mounts]]
241
+ source = "data"
242
+ destination = "/rails/storage"
243
+
244
+ [[vm]]
245
+ memory = "1024mb"
246
+ cpu_kind = "shared"
247
+ cpus = 2
248
+ ```
249
+
250
+ **Procfile for Fly.io:**
251
+
252
+ ```procfile
253
+ web: bundle exec puma -C config/puma.rb
254
+ ocr: bin/activestorage-ocr-server --host 127.0.0.1 --port 9292
255
+ ```
256
+
257
+ **Important notes for Fly.io:**
258
+ - Use `foreman` as the entrypoint to run both processes
259
+ - The OCR server binds to `127.0.0.1` (internal only)
260
+ - Set `ACTIVESTORAGE_OCR_SERVER_URL` env var to `http://127.0.0.1:9292`
261
+ - For SQLite with volumes, run migrations in `docker-entrypoint` not `release_command`
262
+
263
+ ### Environment Variables
264
+
265
+ | Variable | Default | Description |
266
+ |----------|---------|-------------|
267
+ | `ACTIVESTORAGE_OCR_SERVER_URL` | `http://127.0.0.1:9292` | URL where the OCR server is running |
268
+ | `ACTIVESTORAGE_OCR_TIMEOUT` | `30` | Request timeout in seconds |
269
+ | `ACTIVESTORAGE_OCR_OPEN_TIMEOUT` | `5` | Connection timeout in seconds |
270
+
271
+ ### Troubleshooting
272
+
273
+ **Binary not found:**
274
+ ```
275
+ Error: bin/activestorage-ocr-server: No such file or directory
276
+ ```
277
+ Solution: Run `rails generate activestorage_ocr:install` or `rails activestorage_ocr:install path=./bin/dist`
278
+
279
+ **Connection refused:**
280
+ ```
281
+ Faraday::ConnectionFailed: Connection refused
282
+ ```
283
+ Solution: Ensure the OCR server is running and `ACTIVESTORAGE_OCR_SERVER_URL` is correctly configured.
284
+
285
+ **Timeout errors:**
286
+ ```
287
+ Faraday::TimeoutError
288
+ ```
289
+ Solution: Increase timeout values in the initializer or reduce image/PDF sizes.
290
+
291
+ **Health check:**
292
+ ```bash
293
+ # Verify the OCR server is responding
294
+ curl http://localhost:9292/health
295
+
296
+ # Or via rake task
297
+ bin/rails activestorage_ocr:health
298
+ ```
91
299
 
92
300
  ## Rake Tasks
93
301
 
@@ -95,7 +303,13 @@ end
95
303
  # Install the OCR server binary for your platform
96
304
  bin/rails activestorage_ocr:install
97
305
 
98
- # Start the OCR server
306
+ # Install with Tesseract support (all-engines variant)
307
+ bin/rails activestorage_ocr:install variant=all
308
+
309
+ # Install to a custom path
310
+ bin/rails activestorage_ocr:install variant=all path=./bin/dist
311
+
312
+ # Start the OCR server (for manual testing)
99
313
  bin/rails activestorage_ocr:start
100
314
 
101
315
  # Check server health
@@ -103,6 +317,9 @@ bin/rails activestorage_ocr:health
103
317
 
104
318
  # Show binary info (platform, path, version)
105
319
  bin/rails activestorage_ocr:info
320
+
321
+ # Compare both OCR engines on a file
322
+ bin/rails activestorage_ocr:compare file=/path/to/image.png
106
323
  ```
107
324
 
108
325
  ## API Endpoints
@@ -112,8 +329,10 @@ The Rust server exposes these HTTP endpoints:
112
329
  | Endpoint | Method | Description |
113
330
  |----------|--------|-------------|
114
331
  | `/health` | GET | Health check |
115
- | `/info` | GET | Server info and supported formats |
116
- | `/ocr` | POST | Extract text from uploaded file |
332
+ | `/info` | GET | Server info, available engines, and supported formats |
333
+ | `/ocr` | POST | Extract text using default engine |
334
+ | `/ocr/ocrs` | POST | Extract text using ocrs engine |
335
+ | `/ocr/leptess` | POST | Extract text using Tesseract engine |
117
336
 
118
337
  ### Example with curl
119
338
 
@@ -121,9 +340,16 @@ The Rust server exposes these HTTP endpoints:
121
340
  # Health check
122
341
  curl http://localhost:9292/health
123
342
 
124
- # OCR an image
343
+ # Server info (shows available engines)
344
+ curl http://localhost:9292/info
345
+
346
+ # OCR with default engine
125
347
  curl -X POST http://localhost:9292/ocr \
126
348
  -F "file=@document.png;type=image/png"
349
+
350
+ # OCR with specific engine (requires all-engines variant)
351
+ curl -X POST http://localhost:9292/ocr/leptess \
352
+ -F "file=@document.png;type=image/png"
127
353
  ```
128
354
 
129
355
  ## Development
@@ -131,22 +357,38 @@ curl -X POST http://localhost:9292/ocr \
131
357
  ### Building from source
132
358
 
133
359
  ```bash
134
- # Build the Rust server
360
+ # Build with default engine (ocrs only)
135
361
  cd rust
136
362
  cargo build --release
137
363
 
364
+ # Build with all engines (ocrs + Tesseract)
365
+ cargo build --release --features all-engines
366
+
367
+ # Build with specific engine only
368
+ cargo build --release --features engine-leptess
369
+
138
370
  # The binary will be at rust/target/release/activestorage-ocr-server
139
371
  ```
140
372
 
373
+ For local development with a Rails app, you can install a locally-built binary:
374
+
375
+ ```bash
376
+ # In your Rails app directory
377
+ bin/rails activestorage_ocr:install source=/path/to/activestorage-ocr/rust/target/release/activestorage-ocr-server path=./bin/dist
378
+ ```
379
+
141
380
  ### Running tests
142
381
 
143
382
  ```bash
144
383
  # Ruby unit tests
145
384
  bundle exec rake test
146
385
 
147
- # Rust tests
386
+ # Rust tests (default engine)
148
387
  cd rust && cargo test
149
388
 
389
+ # Rust tests (all engines)
390
+ cd rust && cargo test --features all-engines -- --test-threads=1
391
+
150
392
  # Integration tests (requires server running)
151
393
  cd rust && ./target/release/activestorage-ocr-server &
152
394
  cd test/sandbox && RAILS_ENV=test bin/rails test
@@ -15,6 +15,7 @@ module ActiveStorage
15
15
  # After analysis, blobs will have the following metadata:
16
16
  # * +ocr_text+ - The extracted text
17
17
  # * +ocr_confidence+ - Confidence score (0.0 to 1.0)
18
+ # * +ocr_engine+ - The OCR engine used ("ocrs" or "leptess")
18
19
  # * +ocr_processed_at+ - ISO 8601 timestamp
19
20
  #
20
21
  # == Example
@@ -20,11 +20,23 @@ module ActiveStorage
20
20
  # * linux-x86_64 (Linux x86_64)
21
21
  # * linux-aarch64 (Linux ARM64)
22
22
  #
23
+ # == Binary Variants
24
+ #
25
+ # * :ocrs - Pure Rust OCR engine only (~15MB, no system dependencies)
26
+ # * :leptess - Tesseract OCR engine only (~50-80MB, no system dependencies)
27
+ # * :all - Both engines included (~80-100MB)
28
+ #
23
29
  # == Usage
24
30
  #
25
- # # Install the binary for the current platform
31
+ # # Install the default (ocrs) binary for the current platform
26
32
  # ActiveStorage::Ocr::Binary.install!
27
33
  #
34
+ # # Install the leptess variant
35
+ # ActiveStorage::Ocr::Binary.install!(variant: :leptess)
36
+ #
37
+ # # Install all-engines variant
38
+ # ActiveStorage::Ocr::Binary.install!(variant: :all)
39
+ #
28
40
  # # Check if binary is installed
29
41
  # ActiveStorage::Ocr::Binary.installed? # => true
30
42
  #
@@ -38,6 +50,22 @@ module ActiveStorage
38
50
  # Name of the server binary.
39
51
  BINARY_NAME = "activestorage-ocr-server"
40
52
 
53
+ # Available binary variants with their download suffix and descriptions.
54
+ VARIANTS = {
55
+ ocrs: {
56
+ suffix: "",
57
+ description: "Pure Rust OCR engine (fast, no system dependencies)"
58
+ },
59
+ leptess: {
60
+ suffix: "-leptess",
61
+ description: "Tesseract OCR engine (better for messy images)"
62
+ },
63
+ all: {
64
+ suffix: "-all",
65
+ description: "All OCR engines included"
66
+ }
67
+ }.freeze
68
+
41
69
  class << self
42
70
  # Detects the current platform.
43
71
  #
@@ -107,24 +135,56 @@ module ActiveStorage
107
135
  ActiveStorage::Ocr::VERSION
108
136
  end
109
137
 
110
- # Returns the download URL for the current platform.
138
+ # Returns the download URL for the current platform and variant.
139
+ #
140
+ # ==== Parameters
141
+ #
142
+ # * +variant+ - The binary variant (:ocrs, :leptess, or :all)
111
143
  #
112
144
  # ==== Returns
113
145
  #
114
146
  # GitHub releases URL for the platform-specific tarball.
115
- def download_url
147
+ def download_url(variant: :ocrs)
148
+ variant = variant.to_sym
149
+ validate_variant!(variant)
116
150
  tag = "v#{version}"
117
- filename = "activestorage-ocr-server-#{platform}.tar.gz"
151
+ suffix = VARIANTS[variant][:suffix]
152
+ filename = "activestorage-ocr-server#{suffix}-#{platform}.tar.gz"
118
153
  "https://github.com/#{GITHUB_REPO}/releases/download/#{tag}/#{filename}"
119
154
  end
120
155
 
156
+ # Lists available binary variants.
157
+ #
158
+ # ==== Returns
159
+ #
160
+ # Array of variant names.
161
+ def available_variants
162
+ VARIANTS.keys
163
+ end
164
+
165
+ # Returns info about a specific variant.
166
+ #
167
+ # ==== Parameters
168
+ #
169
+ # * +variant+ - The variant name (:ocrs, :leptess, or :all)
170
+ #
171
+ # ==== Returns
172
+ #
173
+ # Hash with :suffix and :description keys.
174
+ def variant_info(variant)
175
+ validate_variant!(variant)
176
+ VARIANTS[variant]
177
+ end
178
+
121
179
  # Downloads and installs the binary.
122
180
  #
123
- # Downloads from GitHub releases and extracts to the gem's bin directory.
181
+ # Downloads from GitHub releases and extracts to the specified directory.
124
182
  #
125
183
  # ==== Parameters
126
184
  #
127
185
  # * +force+ - If true, reinstalls even if already installed
186
+ # * +path+ - Custom installation directory (defaults to gem's bin directory)
187
+ # * +variant+ - The binary variant to install (:ocrs, :leptess, or :all)
128
188
  #
129
189
  # ==== Returns
130
190
  #
@@ -133,27 +193,52 @@ module ActiveStorage
133
193
  # ==== Raises
134
194
  #
135
195
  # RuntimeError if the download fails.
136
- def install!(force: false)
137
- return binary_path if installed? && !force
196
+ # ArgumentError if an invalid variant is specified.
197
+ def install!(force: false, path: nil, variant: :ocrs)
198
+ validate_variant!(variant)
199
+ target_dir = path || install_dir
200
+ target_path = File.join(target_dir, BINARY_NAME)
201
+
202
+ if !force && File.executable?(target_path)
203
+ puts "Binary already installed at #{target_path}"
204
+ return target_path
205
+ end
138
206
 
139
- puts "Downloading activestorage-ocr-server for #{platform}..."
207
+ FileUtils.mkdir_p(target_dir)
140
208
 
141
- uri = URI(download_url)
209
+ variant_desc = VARIANTS[variant][:description]
210
+ puts "Downloading activestorage-ocr-server (#{variant_desc}) for #{platform}..."
211
+
212
+ url = download_url(variant: variant)
213
+ uri = URI(url)
142
214
  response = fetch_with_redirects(uri)
143
215
 
144
216
  unless response.is_a?(Net::HTTPSuccess)
217
+ feature_flag = variant == :ocrs ? "engine-ocrs" : (variant == :leptess ? "engine-leptess" : "all-engines")
145
218
  raise "Failed to download binary: #{response.code} #{response.message}\n" \
146
- "URL: #{download_url}\n" \
147
- "You may need to build from source: cd rust && cargo build --release"
219
+ "URL: #{url}\n" \
220
+ "You may need to build from source: cd rust && cargo build --release --features #{feature_flag}"
148
221
  end
149
222
 
150
- extract_binary(response.body)
151
- puts "Installed to #{binary_path}"
152
- binary_path
223
+ extract_binary(response.body, target_path)
224
+ puts "Installed to #{target_path}"
225
+ target_path
153
226
  end
154
227
 
155
228
  private
156
229
 
230
+ # Validates the variant parameter.
231
+ #
232
+ # ==== Raises
233
+ #
234
+ # ArgumentError if the variant is not valid.
235
+ def validate_variant!(variant)
236
+ variant = variant.to_sym
237
+ unless VARIANTS.key?(variant)
238
+ raise ArgumentError, "Invalid variant: #{variant}. Valid variants: #{VARIANTS.keys.join(', ')}"
239
+ end
240
+ end
241
+
157
242
  # Fetches a URL, following redirects.
158
243
  def fetch_with_redirects(uri, limit = 10)
159
244
  raise "Too many redirects" if limit == 0
@@ -174,17 +259,17 @@ module ActiveStorage
174
259
  end
175
260
 
176
261
  # Extracts the binary from a gzipped tarball.
177
- def extract_binary(tarball_data)
262
+ def extract_binary(tarball_data, target_path)
178
263
  gz = Zlib::GzipReader.new(StringIO.new(tarball_data))
179
264
  tar = Gem::Package::TarReader.new(gz)
180
265
 
181
266
  tar.each do |entry|
182
267
  next unless entry.file? && entry.full_name == BINARY_NAME
183
268
 
184
- File.open(binary_path, "wb") do |f|
269
+ File.open(target_path, "wb") do |f|
185
270
  f.write(entry.read)
186
271
  end
187
- File.chmod(0o755, binary_path)
272
+ File.chmod(0o755, target_path)
188
273
  return
189
274
  end
190
275
 
@@ -17,6 +17,12 @@ module ActiveStorage
17
17
  # # Extract text from a file path
18
18
  # result = client.extract_text_from_path("/path/to/image.png")
19
19
  #
20
+ # # Use a specific OCR engine
21
+ # result = client.extract_text(document.file, engine: :leptess)
22
+ #
23
+ # # Compare results from both engines
24
+ # comparison = client.compare(document.file)
25
+ #
20
26
  # # Check server health
21
27
  # client.healthy? # => true
22
28
  #
@@ -37,6 +43,7 @@ module ActiveStorage
37
43
  # ==== Parameters
38
44
  #
39
45
  # * +blob+ - An ActiveStorage::Blob instance
46
+ # * +engine+ - OCR engine to use (:ocrs or :leptess). Defaults to configured engine.
40
47
  #
41
48
  # ==== Returns
42
49
  #
@@ -46,9 +53,9 @@ module ActiveStorage
46
53
  #
47
54
  # * ConnectionError - if the server is unreachable
48
55
  # * ServerError - if the server returns an error
49
- def extract_text(blob)
56
+ def extract_text(blob, engine: nil)
50
57
  blob.open do |file|
51
- extract_text_from_file(file, blob.content_type, blob.filename.to_s)
58
+ extract_text_from_file(file, blob.content_type, blob.filename.to_s, engine: engine)
52
59
  end
53
60
  end
54
61
 
@@ -59,6 +66,7 @@ module ActiveStorage
59
66
  # * +path+ - Path to the file
60
67
  # * +content_type+ - MIME type (auto-detected if not provided)
61
68
  # * +filename+ - Filename to send (defaults to basename of path)
69
+ # * +engine+ - OCR engine to use (:ocrs or :leptess). Defaults to configured engine.
62
70
  #
63
71
  # ==== Returns
64
72
  #
@@ -68,12 +76,12 @@ module ActiveStorage
68
76
  #
69
77
  # * ConnectionError - if the server is unreachable
70
78
  # * ServerError - if the server returns an error
71
- def extract_text_from_path(path, content_type: nil, filename: nil)
79
+ def extract_text_from_path(path, content_type: nil, filename: nil, engine: nil)
72
80
  content_type ||= Marcel::MimeType.for(Pathname.new(path))
73
81
  filename ||= File.basename(path)
74
82
 
75
83
  File.open(path, "rb") do |file|
76
- extract_text_from_file(file, content_type, filename)
84
+ extract_text_from_file(file, content_type, filename, engine: engine)
77
85
  end
78
86
  end
79
87
 
@@ -86,6 +94,7 @@ module ActiveStorage
86
94
  # * +file+ - An IO object (File, StringIO, etc.)
87
95
  # * +content_type+ - MIME type of the file
88
96
  # * +filename+ - Filename to send to the server
97
+ # * +engine+ - OCR engine to use (:ocrs or :leptess). Defaults to configured engine.
89
98
  #
90
99
  # ==== Returns
91
100
  #
@@ -95,8 +104,11 @@ module ActiveStorage
95
104
  #
96
105
  # * ConnectionError - if the server is unreachable
97
106
  # * ServerError - if the server returns an error
98
- def extract_text_from_file(file, content_type, filename)
99
- response = connection.post("/ocr") do |req|
107
+ def extract_text_from_file(file, content_type, filename, engine: nil)
108
+ target_engine = engine || @config.engine
109
+ endpoint = ocr_endpoint_for(target_engine)
110
+
111
+ response = connection.post(endpoint) do |req|
100
112
  req.body = {
101
113
  file: Faraday::Multipart::FilePart.new(
102
114
  file,
@@ -111,6 +123,57 @@ module ActiveStorage
111
123
  raise ConnectionError, "Failed to connect to OCR server: #{e.message}"
112
124
  end
113
125
 
126
+ # Compares OCR results from both engines.
127
+ #
128
+ # Runs OCR on the same file using both ocrs and leptess engines,
129
+ # allowing you to compare accuracy and performance.
130
+ #
131
+ # ==== Parameters
132
+ #
133
+ # * +blob+ - An ActiveStorage::Blob instance
134
+ #
135
+ # ==== Returns
136
+ #
137
+ # A Hash with :ocrs and :leptess keys, each containing a Result object.
138
+ #
139
+ # ==== Raises
140
+ #
141
+ # * ConnectionError - if the server is unreachable
142
+ # * ServerError - if the server returns an error
143
+ def compare(blob)
144
+ ocrs_result = extract_text(blob, engine: :ocrs)
145
+ leptess_result = extract_text(blob, engine: :leptess)
146
+
147
+ {
148
+ ocrs: ocrs_result,
149
+ leptess: leptess_result
150
+ }
151
+ end
152
+
153
+ # Compares OCR results from both engines using a file path.
154
+ #
155
+ # ==== Parameters
156
+ #
157
+ # * +path+ - Path to the file
158
+ # * +content_type+ - MIME type (auto-detected if not provided)
159
+ # * +filename+ - Filename to send (defaults to basename of path)
160
+ #
161
+ # ==== Returns
162
+ #
163
+ # A Hash with :ocrs and :leptess keys, each containing a Result object.
164
+ def compare_from_path(path, content_type: nil, filename: nil)
165
+ content_type ||= Marcel::MimeType.for(Pathname.new(path))
166
+ filename ||= File.basename(path)
167
+
168
+ ocrs_result = extract_text_from_path(path, content_type: content_type, filename: filename, engine: :ocrs)
169
+ leptess_result = extract_text_from_path(path, content_type: content_type, filename: filename, engine: :leptess)
170
+
171
+ {
172
+ ocrs: ocrs_result,
173
+ leptess: leptess_result
174
+ }
175
+ end
176
+
114
177
  # Checks if the OCR server is healthy.
115
178
  #
116
179
  # ==== Returns
@@ -143,6 +206,26 @@ module ActiveStorage
143
206
 
144
207
  private
145
208
 
209
+ # Returns the OCR endpoint path for the given engine.
210
+ #
211
+ # ==== Parameters
212
+ #
213
+ # * +engine+ - Engine name (:ocrs or :leptess)
214
+ #
215
+ # ==== Returns
216
+ #
217
+ # The endpoint path string (e.g., "/ocr" or "/ocr/leptess")
218
+ def ocr_endpoint_for(engine)
219
+ case engine.to_sym
220
+ when :ocrs
221
+ "/ocr"
222
+ when :leptess
223
+ "/ocr/leptess"
224
+ else
225
+ raise ArgumentError, "Unknown engine: #{engine}"
226
+ end
227
+ end
228
+
146
229
  # Returns the Faraday connection, creating it if necessary.
147
230
  def connection
148
231
  @connection ||= Faraday.new(url: @config.server_url) do |f|
@@ -166,7 +249,8 @@ module ActiveStorage
166
249
  text: data[:text],
167
250
  confidence: data[:confidence],
168
251
  processing_time_ms: data[:processing_time_ms],
169
- warnings: data[:warnings] || []
252
+ warnings: data[:warnings] || [],
253
+ engine: data[:engine]
170
254
  )
171
255
  end
172
256
  end
@@ -9,6 +9,7 @@ module ActiveStorage
9
9
  # ActiveStorage::Ocr.configure do |config|
10
10
  # config.server_url = "http://localhost:9292"
11
11
  # config.timeout = 60
12
+ # config.engine = :leptess # Use Tesseract engine instead of default ocrs
12
13
  # end
13
14
  #
14
15
  # == Environment Variables
@@ -16,8 +17,12 @@ module ActiveStorage
16
17
  # * +ACTIVESTORAGE_OCR_SERVER_URL+ - OCR server URL (default: http://localhost:9292)
17
18
  # * +ACTIVESTORAGE_OCR_TIMEOUT+ - Request timeout in seconds (default: 30)
18
19
  # * +ACTIVESTORAGE_OCR_OPEN_TIMEOUT+ - Connection timeout in seconds (default: 5)
20
+ # * +ACTIVESTORAGE_OCR_ENGINE+ - OCR engine to use: ocrs (default) or leptess
19
21
  #
20
22
  class Configuration
23
+ # Valid OCR engine names
24
+ VALID_ENGINES = %i[ocrs leptess].freeze
25
+
21
26
  # The URL of the OCR server.
22
27
  attr_accessor :server_url
23
28
 
@@ -30,6 +35,11 @@ module ActiveStorage
30
35
  # Array of MIME types that the analyzer will process.
31
36
  attr_accessor :content_types
32
37
 
38
+ # The OCR engine to use (:ocrs or :leptess).
39
+ # Default is :ocrs (pure Rust, no dependencies).
40
+ # Use :leptess for Tesseract-based OCR (better for messy images).
41
+ attr_reader :engine
42
+
33
43
  # Creates a new Configuration with default values.
34
44
  #
35
45
  # Defaults are read from environment variables if set.
@@ -38,6 +48,25 @@ module ActiveStorage
38
48
  @timeout = ENV.fetch("ACTIVESTORAGE_OCR_TIMEOUT", 30).to_i
39
49
  @open_timeout = ENV.fetch("ACTIVESTORAGE_OCR_OPEN_TIMEOUT", 5).to_i
40
50
  @content_types = default_content_types
51
+ self.engine = ENV.fetch("ACTIVESTORAGE_OCR_ENGINE", "ocrs").to_sym
52
+ end
53
+
54
+ # Set the OCR engine to use.
55
+ #
56
+ # ==== Parameters
57
+ #
58
+ # * +value+ - Engine name (:ocrs or :leptess)
59
+ #
60
+ # ==== Raises
61
+ #
62
+ # * +ArgumentError+ if an invalid engine name is provided
63
+ def engine=(value)
64
+ value = value.to_sym
65
+ unless VALID_ENGINES.include?(value)
66
+ raise ArgumentError, "Invalid engine: #{value}. Valid engines: #{VALID_ENGINES.join(', ')}"
67
+ end
68
+
69
+ @engine = value
41
70
  end
42
71
 
43
72
  # Returns the default list of supported content types.
@@ -13,6 +13,7 @@ module ActiveStorage
13
13
  # * +activestorage_ocr:start+ - Start the OCR server
14
14
  # * +activestorage_ocr:health+ - Check if the server is responding
15
15
  # * +activestorage_ocr:info+ - Show binary and platform information
16
+ # * +activestorage_ocr:compare+ - Compare OCR results from different engines
16
17
  #
17
18
  class Railtie < Rails::Railtie
18
19
  # Registers the OCR analyzer with Active Storage.
@@ -29,9 +30,23 @@ module ActiveStorage
29
30
  # Defines rake tasks for server management.
30
31
  rake_tasks do
31
32
  namespace :activestorage_ocr do
32
- desc "Install the OCR server binary"
33
+ desc "Install the OCR server binary (variant=ocrs|leptess|all, path=./bin/dist, source=local_binary_path)"
33
34
  task :install do
34
- ActiveStorage::Ocr::Binary.install!
35
+ path = ENV["path"]
36
+ variant = (ENV["variant"] || "ocrs").to_sym
37
+ source = ENV["source"]
38
+
39
+ if source
40
+ # Local development: copy from a local path instead of downloading
41
+ target_dir = path || ActiveStorage::Ocr::Binary.install_dir
42
+ target_path = File.join(target_dir, ActiveStorage::Ocr::Binary::BINARY_NAME)
43
+ FileUtils.mkdir_p(target_dir)
44
+ FileUtils.cp(source, target_path)
45
+ FileUtils.chmod(0o755, target_path)
46
+ puts "Copied #{source} to #{target_path}"
47
+ else
48
+ ActiveStorage::Ocr::Binary.install!(path: path, variant: variant)
49
+ end
35
50
  end
36
51
 
37
52
  desc "Check OCR server health"
@@ -41,7 +56,11 @@ module ActiveStorage
41
56
  puts "OCR server is healthy"
42
57
  info = client.server_info
43
58
  puts " Version: #{info[:version]}"
44
- puts " Supported formats: #{info[:supported_formats].join(', ')}"
59
+ puts " Default engine: #{info[:default_engine]}"
60
+ puts " Available engines:"
61
+ info[:available_engines].each do |engine|
62
+ puts " - #{engine[:name]}: #{engine[:description]}"
63
+ end
45
64
  else
46
65
  puts "OCR server is not responding"
47
66
  exit 1
@@ -53,7 +72,8 @@ module ActiveStorage
53
72
  binary = ActiveStorage::Ocr::Binary.binary_path
54
73
  unless File.executable?(binary)
55
74
  puts "Binary not found. Installing..."
56
- ActiveStorage::Ocr::Binary.install!
75
+ variant = (ENV["variant"] || "ocrs").to_sym
76
+ ActiveStorage::Ocr::Binary.install!(variant: variant)
57
77
  end
58
78
 
59
79
  config = ActiveStorage::Ocr.configuration
@@ -71,6 +91,59 @@ module ActiveStorage
71
91
  puts "Binary path: #{ActiveStorage::Ocr::Binary.binary_path}"
72
92
  puts "Installed: #{ActiveStorage::Ocr::Binary.installed?}"
73
93
  puts "Version: #{ActiveStorage::Ocr::Binary.version}"
94
+ puts ""
95
+ puts "Available variants:"
96
+ ActiveStorage::Ocr::Binary::VARIANTS.each do |name, info|
97
+ puts " #{name}: #{info[:description]}"
98
+ end
99
+ end
100
+
101
+ desc "Compare OCR engines on a file (file=/path/to/image.png)"
102
+ task compare: :environment do
103
+ file_path = ENV["file"]
104
+ unless file_path
105
+ puts "Usage: rake activestorage_ocr:compare file=/path/to/image.png"
106
+ exit 1
107
+ end
108
+
109
+ unless File.exist?(file_path)
110
+ puts "File not found: #{file_path}"
111
+ exit 1
112
+ end
113
+
114
+ client = ActiveStorage::Ocr::Client.new
115
+ puts "Comparing OCR engines on: #{file_path}"
116
+ puts ""
117
+
118
+ begin
119
+ comparison = client.compare_from_path(file_path)
120
+
121
+ comparison.each do |engine, result|
122
+ puts "#{engine.to_s.upcase}:"
123
+ puts " Text length: #{result.text.length} characters"
124
+ puts " Confidence: #{(result.confidence * 100).round(1)}%"
125
+ puts " Processing time: #{result.processing_time_ms}ms"
126
+ if result.warnings.any?
127
+ puts " Warnings: #{result.warnings.join(', ')}"
128
+ end
129
+ puts ""
130
+ end
131
+
132
+ # Summary
133
+ faster = comparison.min_by { |_, r| r.processing_time_ms }
134
+ higher_conf = comparison.max_by { |_, r| r.confidence }
135
+ puts "Summary:"
136
+ puts " Faster engine: #{faster[0]} (#{faster[1].processing_time_ms}ms)"
137
+ puts " Higher confidence: #{higher_conf[0]} (#{(higher_conf[1].confidence * 100).round(1)}%)"
138
+ rescue ActiveStorage::Ocr::ConnectionError => e
139
+ puts "Error: #{e.message}"
140
+ puts "Make sure the OCR server is running with both engines enabled."
141
+ exit 1
142
+ rescue ActiveStorage::Ocr::ServerError => e
143
+ puts "Server error: #{e.message}"
144
+ puts "This engine may not be available. Check server configuration."
145
+ exit 1
146
+ end
74
147
  end
75
148
  end
76
149
  end
@@ -27,6 +27,9 @@ module ActiveStorage
27
27
  # Array of warning messages from the OCR server.
28
28
  attr_reader :warnings
29
29
 
30
+ # The OCR engine that processed this result (e.g., "ocrs" or "leptess").
31
+ attr_reader :engine
32
+
30
33
  # Creates a new Result.
31
34
  #
32
35
  # ==== Parameters
@@ -35,11 +38,13 @@ module ActiveStorage
35
38
  # * +confidence+ - Confidence score (0.0 to 1.0)
36
39
  # * +processing_time_ms+ - Processing time in milliseconds
37
40
  # * +warnings+ - Array of warning messages (optional)
38
- def initialize(text:, confidence:, processing_time_ms:, warnings: [])
41
+ # * +engine+ - The OCR engine used (optional)
42
+ def initialize(text:, confidence:, processing_time_ms:, warnings: [], engine: nil)
39
43
  @text = text
40
44
  @confidence = confidence
41
45
  @processing_time_ms = processing_time_ms
42
46
  @warnings = warnings
47
+ @engine = engine
43
48
  end
44
49
 
45
50
  # Returns whether OCR successfully extracted text.
@@ -61,7 +66,8 @@ module ActiveStorage
61
66
  text: text,
62
67
  confidence: confidence,
63
68
  processing_time_ms: processing_time_ms,
64
- warnings: warnings
69
+ warnings: warnings,
70
+ engine: engine
65
71
  }
66
72
  end
67
73
 
@@ -71,11 +77,12 @@ module ActiveStorage
71
77
  #
72
78
  # ==== Returns
73
79
  #
74
- # A Hash with +:ocr_text+, +:ocr_confidence+, and +:ocr_processed_at+.
80
+ # A Hash with +:ocr_text+, +:ocr_confidence+, +:ocr_engine+, and +:ocr_processed_at+.
75
81
  def to_metadata
76
82
  {
77
83
  ocr_text: text,
78
84
  ocr_confidence: confidence,
85
+ ocr_engine: engine,
79
86
  ocr_processed_at: Time.now.utc.iso8601
80
87
  }
81
88
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module ActiveStorage
4
4
  module Ocr
5
- VERSION = "0.1.1"
5
+ VERSION = "0.2.0"
6
6
  end
7
7
  end
@@ -19,14 +19,20 @@ module ActiveStorage
19
19
  # OCR support for Rails Active Storage attachments.
20
20
  #
21
21
  # This module provides optical character recognition (OCR) for files stored
22
- # with Active Storage using a high-performance Rust server with the pure-Rust
23
- # +ocrs+ OCR engine.
22
+ # with Active Storage using a high-performance Rust server.
23
+ #
24
+ # == OCR Engines
25
+ #
26
+ # Two engines are available:
27
+ # * +:ocrs+ - Pure Rust engine (default). Fast, no system dependencies.
28
+ # * +:leptess+ - Tesseract-based engine. Better for noisy/messy images.
24
29
  #
25
30
  # == Configuration
26
31
  #
27
32
  # ActiveStorage::Ocr.configure do |config|
28
33
  # config.server_url = "http://localhost:9292"
29
34
  # config.timeout = 30
35
+ # config.engine = :leptess # Use Tesseract instead of default ocrs
30
36
  # end
31
37
  #
32
38
  # == Basic Usage
@@ -35,6 +41,16 @@ module ActiveStorage
35
41
  # result = ActiveStorage::Ocr.extract_text(document.file)
36
42
  # result.text # => "Extracted text..."
37
43
  # result.confidence # => 0.95
44
+ # result.engine # => "ocrs"
45
+ #
46
+ # # Use a specific engine for one request
47
+ # client = ActiveStorage::Ocr::Client.new
48
+ # result = client.extract_text(document.file, engine: :leptess)
49
+ #
50
+ # # Compare both engines
51
+ # comparison = client.compare(document.file)
52
+ # comparison[:ocrs].text # => "Text from ocrs..."
53
+ # comparison[:leptess].text # => "Text from leptess..."
38
54
  #
39
55
  # == Error Handling
40
56
  #
@@ -0,0 +1,12 @@
1
+ Description:
2
+ Installs the OCR server binary and creates a wrapper script.
3
+
4
+ Example:
5
+ rails generate activestorage_ocr:install
6
+
7
+ This will create:
8
+ bin/activestorage-ocr-server - Wrapper script that runs the OCR server
9
+ bin/dist/activestorage-ocr-server - The actual binary (gitignored)
10
+
11
+ And update:
12
+ .gitignore - Adds /bin/dist/ to ignore the binary
@@ -0,0 +1,64 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "rails/generators"
4
+
5
+ module ActivestorageOcr
6
+ module Generators
7
+ # Installs the OCR server binary and creates a binstub.
8
+ #
9
+ # This generator creates a bin/activestorage-ocr-server script that
10
+ # automatically downloads the binary if needed and runs it.
11
+ #
12
+ # == Usage
13
+ #
14
+ # rails generate activestorage_ocr:install
15
+ #
16
+ # == What it does
17
+ #
18
+ # 1. Creates bin/activestorage-ocr-server (wrapper script)
19
+ # 2. Downloads binary to bin/dist/activestorage-ocr-server
20
+ # 3. Adds bin/dist/ to .gitignore
21
+ #
22
+ class InstallGenerator < Rails::Generators::Base
23
+ source_root File.expand_path("templates", __dir__)
24
+
25
+ desc "Installs the OCR server binary and creates a binstub"
26
+
27
+ def create_binstub
28
+ template "bin/activestorage-ocr-server", "bin/activestorage-ocr-server"
29
+ chmod "bin/activestorage-ocr-server", 0o755
30
+ end
31
+
32
+ def update_gitignore
33
+ gitignore_path = Rails.root.join(".gitignore")
34
+ return unless File.exist?(gitignore_path)
35
+
36
+ gitignore_content = File.read(gitignore_path)
37
+ return if gitignore_content.include?("/bin/dist")
38
+
39
+ append_to_file ".gitignore", "\n# OCR server binary\n/bin/dist/\n"
40
+ end
41
+
42
+ def download_binary
43
+ say "Downloading OCR server binary...", :green
44
+ require "activestorage/ocr/binary"
45
+ dist_dir = Rails.root.join("bin", "dist")
46
+ ActiveStorage::Ocr::Binary.install!(path: dist_dir.to_s)
47
+ end
48
+
49
+ def show_instructions
50
+ say ""
51
+ say "OCR server installed successfully!", :green
52
+ say ""
53
+ say "Add to your Procfile:", :yellow
54
+ say " ocr: bin/activestorage-ocr-server --host 127.0.0.1 --port 9292"
55
+ say ""
56
+ say "Configure the server URL in config/initializers/activestorage_ocr.rb:", :yellow
57
+ say " ActiveStorage::Ocr.configure do |config|"
58
+ say " config.server_url = ENV.fetch('ACTIVESTORAGE_OCR_SERVER_URL', 'http://127.0.0.1:9292')"
59
+ say " end"
60
+ say ""
61
+ end
62
+ end
63
+ end
64
+ end
@@ -0,0 +1,16 @@
1
+ #!/usr/bin/env bash
2
+ set -e
3
+
4
+ # Navigate to app root
5
+ cd "$(dirname "$0")/.."
6
+
7
+ BINARY_PATH="./bin/dist/activestorage-ocr-server"
8
+
9
+ # Download binary if not present
10
+ if [ ! -f "$BINARY_PATH" ]; then
11
+ echo "OCR server binary not found. Downloading..."
12
+ ./bin/rails activestorage_ocr:install path=./bin/dist
13
+ fi
14
+
15
+ # Execute the binary with all passed arguments
16
+ exec "$BINARY_PATH" "$@"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: activestorage-ocr
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Michael Rispoli
@@ -112,6 +112,9 @@ files:
112
112
  - lib/activestorage/ocr/railtie.rb
113
113
  - lib/activestorage/ocr/result.rb
114
114
  - lib/activestorage/ocr/version.rb
115
+ - lib/generators/activestorage_ocr/install/USAGE
116
+ - lib/generators/activestorage_ocr/install/install_generator.rb
117
+ - lib/generators/activestorage_ocr/install/templates/bin/activestorage-ocr-server
115
118
  homepage: https://github.com/Cause-of-a-Kind/activestorage-ocr
116
119
  licenses:
117
120
  - MIT