reducto_ai 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 00ad6f946c48de4c4638894e5cce87f594601c85b47ab09714289a21111f83e5
4
+ data.tar.gz: 4ffffbf8a965b92f7e3395604e6be7736764d37a741e7ca589cec36adf2f5dc9
5
+ SHA512:
6
+ metadata.gz: 5a3a7fd5765ef62bd9e9ea56c137c8d84b75741c9dce2c152ecb19b6d95c08f489a64994d71ca4ceb513c3e1144d45cc057232f18f81c546529a5fe1ec0c999c
7
+ data.tar.gz: 205ba689712f16f646eba2e00f29b93e579adc68aa2966c5985dd55b5802afd0ede7b441f7eaaf1b4d3578e273d48dfe7f6f15f9667bf7c1e8af3acb0ebeb947
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2025-10-17
4
+
5
+ - Initial release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 dpaluy
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,451 @@
1
+ # ReductoAi
2
+
3
+ Ruby wrapper on [ReductoAI API](https://docs.reducto.ai/api-reference)
4
+
5
+ ## Installation
6
+
7
+ ```
8
+ bundle add reducto_ai
9
+ ```
10
+
11
+ ## Usage
12
+
13
+ Configure once:
14
+
15
+ ```ruby
16
+ ReductoAI.configure do |config|
17
+ config.api_key = ENV.fetch("REDUCTO_API_KEY")
18
+ end
19
+ ```
20
+
21
+ ### Choosing an action
22
+
23
+ - **Parse**: Start here for any document. Converts uploads or URLs into structured chunks and OCR text so later steps can reuse the returned `job_id`.
24
+ - **Split**: Use after parsing when you need logical sections. Provide `split_description` names/rules to segment the parsed document into labeled ranges.
25
+ - **Extract**: Run when you need structured answers (fields, JSON). Supply instructions or schema to pull values from raw input or an existing parse `job_id`.
26
+ - **Edit**: Generate marked-up PDFs using `document_url` plus `edit_instructions` (PDF forms supported via `form_schema`).
27
+ - **Pipeline**: Trigger a saved Studio pipeline with `input` + `pipeline_id` to orchestrate Parse/Split/Extract/Edit in one call.
28
+
29
+ ### Async Operations
30
+
31
+ All resources support async variants that return a `job_id` for polling:
32
+
33
+ ```ruby
34
+ client = ReductoAI::Client.new
35
+
36
+ # Start async parse
37
+ # API Reference: https://docs.reducto.ai/api-reference/parse-async
38
+ job = client.parse.async(input: "https://example.com/large-doc.pdf")
39
+ job_id = job["job_id"]
40
+
41
+ # Response:
42
+ # {
43
+ # "job_id" => "async-123",
44
+ # "status" => "processing"
45
+ # }
46
+
47
+ # Poll for completion
48
+ # API Reference: https://docs.reducto.ai/api-reference/get-job
49
+ result = client.jobs.retrieve(job_id: job_id)
50
+
51
+ # Response:
52
+ # {
53
+ # "job_id" => "async-123",
54
+ # "status" => "complete",
55
+ # "result" => {...},
56
+ # "usage" => {"credits" => 1.0}
57
+ # }
58
+
59
+ # Or configure webhooks for notifications
60
+ # API Reference: https://docs.reducto.ai/api-reference/webhook-portal
61
+ client.jobs.configure_webhook
62
+ ```
63
+
64
+ Available async methods:
65
+ - `client.parse.async(input:, **options)` - [Parse Async API](https://docs.reducto.ai/api-reference/parse-async)
66
+ - `client.extract.async(input:, instructions:, **options)` - [Extract Async API](https://docs.reducto.ai/api-reference/extract-async)
67
+ - `client.split.async(input:, **options)` - [Split Async API](https://docs.reducto.ai/api-reference/split-async)
68
+ - `client.edit.async(input:, instructions:, **options)` - [Edit Async API](https://docs.reducto.ai/api-reference/edit-async)
69
+ - `client.pipeline.async(input:, steps:, **options)` - [Pipeline Async API](https://docs.reducto.ai/api-reference/pipeline-async)
70
+
71
+ ### Rails
72
+
73
+ Create `config/initializers/reducto_ai.rb`:
74
+
75
+ ```ruby
76
+ ReductoAI.configure do |c|
77
+ c.api_key = Rails.application.credentials.dig(:reducto, :api_key)
78
+ # c.base_url = "https://platform.reducto.ai"
79
+ # c.open_timeout = 5; c.read_timeout = 30
80
+ end
81
+
82
+ # Optional: override shared client (multi-tenant or custom timeouts)
83
+ # ReductoAI.client = ReductoAI::Client.new(api_key: ..., read_timeout: 10)
84
+ ```
85
+
86
+ ### Quick Start
87
+
88
+ ```ruby
89
+ client = ReductoAI::Client.new
90
+
91
+ # Parse a document
92
+ # API Reference: https://docs.reducto.ai/api-reference/parse
93
+ parse = client.parse.sync(input: "https://example.com/invoice.pdf")
94
+ job_id = parse["job_id"]
95
+
96
+ # Response:
97
+ # {
98
+ # "job_id" => "abc-123",
99
+ # "status" => "complete",
100
+ # "result" => {...}
101
+ # }
102
+
103
+ # Extract structured data
104
+ # API Reference: https://docs.reducto.ai/api-reference/extract
105
+ extraction = client.extract.sync(
106
+ input: job_id,
107
+ instructions: {
108
+ schema: {
109
+ type: "object",
110
+ properties: {
111
+ invoice_number: { type: "string" },
112
+ total_due: { type: "string" }
113
+ },
114
+ required: ["invoice_number", "total_due"]
115
+ }
116
+ }
117
+ )
118
+
119
+ # Response:
120
+ # {
121
+ # "job_id" => "820dca1b-3215-4d24-be09-6494d4c3cd88",
122
+ # "usage" => {"num_pages" => 1, "num_fields" => 2, "credits" => 2.0},
123
+ # "studio_link" => "https://studio.reducto.ai/job/820dca1b-3115-4d24-be09-6494d4c3cd88",
124
+ # "result" => [{"invoice_number" => "INV-2024-001", "total_due" => "$1,234.56"}],
125
+ # "citations" => nil
126
+ # }
127
+ ```
128
+
129
+ ### Complete Example: Multi-invoice Processing
130
+
131
+ ```ruby
132
+ client = ReductoAI::Client.new
133
+
134
+ # 1. Parse the document
135
+ # API Reference: https://docs.reducto.ai/api-reference/parse
136
+ parse = client.parse.sync(input: "https://example.com/invoices.pdf")
137
+
138
+ # Response:
139
+ # {
140
+ # "job_id" => "parse-123",
141
+ # "status" => "complete",
142
+ # "result" => {...}
143
+ # }
144
+
145
+ # 2. Split into individual invoices
146
+ # API Reference: https://docs.reducto.ai/api-reference/split
147
+ split = client.split.sync(
148
+ input: parse["job_id"],
149
+ split_description: [
150
+ {
151
+ name: "Invoice",
152
+ description: "All pages that belong to a single invoice",
153
+ partition_key: "invoice_number"
154
+ }
155
+ ],
156
+ split_rules: <<~PROMPT
157
+ The document contains multiple invoices one after another. Each invoice has a unique invoice number formatted like "Invoice #12345" near the top of the first page.
158
+ Segment the document into one partition per invoice. Keep pages contiguous per invoice and include any following appendices until the next invoice number.
159
+ Name each partition using the exact invoice number you detect (e.g., "Invoice #12345").
160
+ PROMPT
161
+ )
162
+
163
+ # Response:
164
+ # {
165
+ # "job_id" => "split-456",
166
+ # "result" => {
167
+ # "splits" => [{
168
+ # "name" => "Invoice",
169
+ # "partitions" => [
170
+ # {"name" => "Invoice #12345", "pages" => [0, 1, 2]},
171
+ # {"name" => "Invoice #12346", "pages" => [3, 4]}
172
+ # ]
173
+ # }]
174
+ # }
175
+ # }
176
+
177
+ # 3. Extract data from each invoice
178
+ # API Reference: https://docs.reducto.ai/api-reference/extract
179
+ invoice_partitions = split.dig("result", "splits").first.fetch("partitions")
180
+ invoice_details = invoice_partitions.map do |partition|
181
+ client.extract.sync(
182
+ input: parse["job_id"],
183
+ instructions: {
184
+ schema: {
185
+ type: "object",
186
+ properties: {
187
+ invoice_number: { type: "string" },
188
+ total_due: { type: "string" }
189
+ },
190
+ required: ["invoice_number", "total_due"]
191
+ }
192
+ },
193
+ settings: { page_range: partition["pages"] }
194
+ )
195
+ end
196
+
197
+ # Response per invoice:
198
+ # {
199
+ # "job_id" => "extract-789",
200
+ # "result" => [{"invoice_number" => "INV-12345", "total_due" => "$2,500.00"}],
201
+ # "usage" => {"credits" => 2.0}
202
+ # }
203
+ ```
204
+
205
+ ### Direct Split Example
206
+
207
+ Split a multi-invoice PDF directly without pre-parsing:
208
+
209
+ ```ruby
210
+ client = ReductoAI::Client.new
211
+
212
+ # Split document directly from URL
213
+ # API Reference: https://docs.reducto.ai/api-reference/split
214
+ response = client.split.sync(
215
+ input: { url: "https://example.com/invoices.pdf" },
216
+ split_description: [
217
+ {
218
+ name: "Invoice",
219
+ description: "Individual invoices within the document",
220
+ partition_key: "invoice_number"
221
+ }
222
+ ]
223
+ )
224
+
225
+ # Response:
226
+ # {
227
+ # "usage" => {"num_pages" => 2, "credits" => nil},
228
+ # "result" => {
229
+ # "section_mapping" => nil,
230
+ # "splits" => [{
231
+ # "name" => "Invoice",
232
+ # "pages" => [1, 2],
233
+ # "conf" => "high",
234
+ # "partitions" => [
235
+ # {"name" => "0000569050-001", "pages" => [1], "conf" => "high"},
236
+ # {"name" => "0000569050-002", "pages" => [2], "conf" => "high"}
237
+ # ]
238
+ # }]
239
+ # }
240
+ # }
241
+
242
+ # Access partitions
243
+ partitions = response.dig("result", "splits").first["partitions"]
244
+ # => [{"name"=>"0000569050-001", "pages"=>[1], "conf"=>"high"}, ...]
245
+ ```
246
+
247
+ ### Document Classification Example
248
+
249
+ ```ruby
250
+ client = ReductoAI::Client.new
251
+
252
+ # Parse document
253
+ # API Reference: https://docs.reducto.ai/api-reference/parse
254
+ parse = client.parse.sync(input: "https://example.com/document.pdf")
255
+
256
+ # Extract with classification
257
+ # API Reference: https://docs.reducto.ai/api-reference/extract
258
+ extraction = client.extract.sync(
259
+ input: parse["job_id"],
260
+ instructions: {
261
+ schema: {
262
+ type: "object",
263
+ properties: {
264
+ document_type: {
265
+ type: "string",
266
+ enum: ["invoice", "credit", "debit"],
267
+ description: "Document category"
268
+ },
269
+ document_number: {
270
+ type: "string",
271
+ description: "Invoice number or equivalent identifier"
272
+ }
273
+ },
274
+ required: ["document_type", "document_number"]
275
+ }
276
+ },
277
+ settings: { citations: { enabled: false } }
278
+ )
279
+
280
+ # Response:
281
+ # {
282
+ # "job_id" => "class-123",
283
+ # "result" => [{"document_type" => "invoice", "document_number" => "INV-2024-001"}],
284
+ # "usage" => {"credits" => 2.0}
285
+ # }
286
+
287
+ document_type = extraction.dig("result", 0, "document_type")
288
+ document_number = extraction.dig("result", 0, "document_number")
289
+ ```
290
+
291
+ ### API Reference
292
+
293
+ Full endpoint details live in the [Reducto API documentation](https://docs.reducto.ai/).
294
+
295
+ ### Best Practices: Cost-Efficient Document Processing
296
+
297
+ Follow these patterns to minimize credit usage when processing documents:
298
+
299
+ #### 1. Parse Once, Reuse Everywhere
300
+
301
+ **❌ Expensive:** Calling extract/split with URLs directly
302
+
303
+ ```ruby
304
+ # DON'T: Each operation parses the document again
305
+ extract1 = client.extract.sync(input: url, instructions: schema_a) # Parse + Extract = 2 credits
306
+ extract2 = client.extract.sync(input: url, instructions: schema_b) # Parse + Extract = 2 credits
307
+ split = client.split.sync(input: url, split_description: [...]) # Parse + Split = 3 credits
308
+ # Total: 7 credits for a 1-page document
309
+ ```
310
+
311
+ **✅ Cost-efficient:** Parse once, reuse `job_id`
312
+
313
+ ```ruby
314
+ # DO: Parse once, reuse the job_id
315
+ parse = client.parse.sync(input: url) # 1 credit
316
+ job_id = parse["job_id"]
317
+
318
+ extract1 = client.extract.sync(input: job_id, instructions: schema_a) # 1 credit
319
+ extract2 = client.extract.sync(input: job_id, instructions: schema_b) # 1 credit
320
+ split = client.split.sync(input: job_id, split_description: [...]) # 2 credits
321
+ # Total: 5 credits for a 1-page document (saved 2 credits)
322
+ ```
323
+
324
+ #### 2. Split Before Extract for Multi-Document Files
325
+
326
+ **✅ Best practice:** Split first, then extract per partition
327
+
328
+ ```ruby
329
+ # 1. Parse the document once
330
+ parse = client.parse.sync(input: "multi-invoice.pdf") # 1 credit × 10 pages = 10 credits
331
+ job_id = parse["job_id"]
332
+
333
+ # 2. Split into partitions
334
+ split = client.split.sync(
335
+ input: job_id,
336
+ split_description: [{ name: "Invoice", description: "..." }]
337
+ ) # 2 credits × 10 pages = 20 credits
338
+
339
+ # 3. Extract only from specific partitions
340
+ partitions = split.dig("result", "splits").first["partitions"]
341
+ invoices = partitions.map do |partition|
342
+ client.extract.sync(
343
+ input: job_id,
344
+ instructions: { schema: invoice_schema },
345
+ settings: { page_range: partition["pages"] } # Extract only relevant pages
346
+ )
347
+ end # 1 credit × 10 pages = 10 credits
348
+
349
+ # Total: 40 credits for 10-page document with 5 invoices
350
+ ```
351
+
352
+ #### 3. Use Async for Large Documents
353
+
354
+ **✅ For documents > 10 pages:** Use async to avoid timeouts
355
+
356
+ ```ruby
357
+ # Parse async for large files
358
+ job = client.parse.async(input: large_pdf_url)
359
+ job_id = job["job_id"]
360
+
361
+ # Poll or use webhooks
362
+ loop do
363
+ result = client.jobs.retrieve(job_id: job_id)
364
+ break if result["status"] == "complete"
365
+ sleep 2
366
+ end
367
+
368
+ # Then reuse the job_id for split/extract
369
+ split = client.split.sync(input: job_id, split_description: [...])
370
+ ```
371
+
372
+ #### 4. Store and Reuse Parse Results
373
+
374
+ **✅ For repeated processing:** Store `job_id` to avoid re-parsing
375
+
376
+ ```ruby
377
+ # Store the job_id with your document record
378
+ document.update(reducto_job_id: parse["job_id"])
379
+
380
+ # Later: Extract different schemas without re-parsing
381
+ schema_v1 = client.extract.sync(input: document.reducto_job_id, instructions: schema_v1)
382
+ schema_v2 = client.extract.sync(input: document.reducto_job_id, instructions: schema_v2)
383
+ # Only 2 credits instead of 4
384
+ ```
385
+
386
+ #### Credit Math Summary
387
+
388
+ | Operation | Direct URL | With job_id | Savings |
389
+ |-----------|-----------|-------------|---------|
390
+ | Parse | 1 credit/page | N/A | - |
391
+ | Extract | 2 credits/page | 1 credit/page | 50% |
392
+ | Split | 3 credits/page | 2 credits/page | 33% |
393
+ | Multiple extracts (3×) | 6 credits/page | 3 credits/page | 50% |
394
+
395
+ **Golden rule:** Always parse once and reuse `job_id` for all subsequent operations.
396
+
397
+ ### Credits & pricing overview
398
+
399
+ Reducto bills every API call in credits. Current public rates are:
400
+
401
+ - **Parse**: 1 credit per standard page (2 for complex VLM-enhanced pages).
402
+ - **Extract**: 2 credits per page (4 if agent-in-loop mode is enabled). Parsing credits are also charged if you **don't** reuse a previous `job_id`.
403
+ - **Split**: 2 credits per page when run standalone; free if you supply a prior parse job.
404
+ - **Edit**: 4 credits per page (beta pricing).
405
+
406
+ You can process ~15k credits/month before overages; additional credits are billed at **$0.015 USD** each according to [Reducto's pricing page](https://reducto.ai/pricing).
407
+
408
+ #### Why Extract costs 2 credits for 1 page
409
+
410
+ When you call `extract.sync(input: url, instructions: schema)` with a URL instead of a `job_id`, Reducto automatically performs two operations:
411
+
412
+ 1. **Parse** (1 credit): Converts PDF → structured text
413
+ 2. **Extract** (1 credit): Applies schema → structured JSON
414
+ 3. **Total: 2 credits**
415
+
416
+ **Cost optimization:** Parse once, extract multiple times:
417
+
418
+ ```ruby
419
+ # Parse once (1 credit)
420
+ parse = client.parse.sync(input: "https://example.com/doc.pdf")
421
+ job_id = parse["job_id"]
422
+
423
+ # Extract multiple schemas (1 credit each)
424
+ schema_a = client.extract.sync(input: job_id, instructions: schema_a)
425
+ schema_b = client.extract.sync(input: job_id, instructions: schema_b)
426
+ # Total: 3 credits instead of 4
427
+ ```
428
+
429
+ #### Credit math for the examples above
430
+
431
+ - **Parse → Split → Extract**: when you start with `ReductoAI.parse` and pass the resulting `job_id` to `split` and `extract`, you pay **1 + 2** = **3 credits per page** (parse + extract). Split reuses the parsed content so it doesn't add extra parse credits.
432
+ - **Document type + number extraction**: the JSON-schema `extract` call uses an existing parse job, so it consumes **parse (1) + extract (2) = 3 credits per page**. Enabling agentic or citations may raise the per-page cost per the [credit usage guide](https://docs.reducto.ai/faq/credit-usage-overview).
433
+
434
+ ## Development
435
+
436
+ ```
437
+ bundle exec rake test
438
+ bundle exec rubocop
439
+ ```
440
+
441
+ ## TODO
442
+
443
+ - [ ] Document webhook workflow and retry semantics
444
+
445
+ ## Contributing
446
+
447
+ Bug reports and pull requests are welcome on GitHub at https://github.com/dpaluy/reducto_ai.
448
+
449
+ ## License
450
+
451
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "minitest/test_task"
5
+
6
+ Minitest::TestTask.create
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[test rubocop]
@@ -0,0 +1,174 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "faraday"
4
+ require "json"
5
+ require "faraday/multipart"
6
+ require_relative "resources/parse"
7
+ require_relative "resources/extract"
8
+ require_relative "resources/split"
9
+ require_relative "resources/edit"
10
+ require_relative "resources/pipeline"
11
+ require_relative "resources/jobs"
12
+
13
+ module ReductoAI
14
+ class Client
15
+ attr_reader :api_key, :base_url, :logger, :open_timeout, :read_timeout
16
+
17
+ def initialize(api_key: nil, base_url: nil, logger: nil, open_timeout: nil, read_timeout: nil)
18
+ configuration = ReductoAI.config
19
+
20
+ @api_key = api_key || configuration.api_key
21
+ @base_url = base_url || configuration.base_url
22
+ @logger = logger || configuration.logger
23
+ @open_timeout = open_timeout || configuration.open_timeout
24
+ @read_timeout = read_timeout || configuration.read_timeout
25
+
26
+ raise ArgumentError, "Missing API key for ReductoAI" if @api_key.to_s.empty?
27
+ end
28
+
29
+ def parse
30
+ @parse ||= Resources::Parse.new(self)
31
+ end
32
+
33
+ def extract
34
+ @extract ||= Resources::Extract.new(self)
35
+ end
36
+
37
+ def split
38
+ @split ||= Resources::Split.new(self)
39
+ end
40
+
41
+ def edit
42
+ @edit ||= Resources::Edit.new(self)
43
+ end
44
+
45
+ def pipeline
46
+ @pipeline ||= Resources::Pipeline.new(self)
47
+ end
48
+
49
+ def jobs
50
+ @jobs ||= Resources::Jobs.new(self)
51
+ end
52
+
53
+ def request(method, path, body: nil, params: nil)
54
+ response = execute_request(method, path, body: body, params: params)
55
+ log_response(method, path, response)
56
+ handle_response(response)
57
+ rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
58
+ raise NetworkError, "Network error: #{e.message}"
59
+ end
60
+
61
+ def post(path, body)
62
+ request(:post, path, body: body)
63
+ end
64
+
65
+ private
66
+
67
+ def connection
68
+ @connection ||= Faraday.new(url: base_url) do |faraday|
69
+ faraday.options.timeout = read_timeout
70
+ faraday.options.open_timeout = open_timeout
71
+ faraday.request :multipart
72
+ faraday.adapter Faraday.default_adapter
73
+ end
74
+ end
75
+
76
+ def handle_response(response)
77
+ status = response.status
78
+ body = response.body
79
+ return coerce_body(body) if success?(status)
80
+
81
+ parsed_body = parse_error_body(body)
82
+ return handle_auth_error(parsed_body, status) if status == 401
83
+ return handle_client_error(parsed_body, status) if client_error?(status)
84
+ return handle_server_error(parsed_body, status) if server_error?(status)
85
+
86
+ raise Error.new(error_message(status, parsed_body), status: status, body: parsed_body)
87
+ end
88
+
89
+ def coerce_body(body)
90
+ return {} if body.nil? || body.empty?
91
+ return body unless body.is_a?(String)
92
+
93
+ JSON.parse(body)
94
+ end
95
+
96
+ def parse_error_body(body)
97
+ return body if body.is_a?(Hash)
98
+ return body if body.nil? || body.empty?
99
+
100
+ JSON.parse(body)
101
+ rescue JSON::ParserError
102
+ body
103
+ end
104
+
105
+ def error_message(status, body)
106
+ detail = body.is_a?(Hash) ? (body["error"] || body["message"] || body.to_json) : body.to_s
107
+ "HTTP #{status}: #{detail}"
108
+ end
109
+
110
+ def log_response(method, path, response)
111
+ return unless logger
112
+
113
+ logger.debug("ReductoAI #{method.to_s.upcase} #{path} -> #{response.status}")
114
+ end
115
+
116
+ def success?(status)
117
+ (200..299).cover?(status)
118
+ end
119
+
120
+ def client_error?(status)
121
+ [400, 404, 422].include?(status)
122
+ end
123
+
124
+ def server_error?(status)
125
+ (500..599).cover?(status)
126
+ end
127
+
128
+ def handle_auth_error(body, status)
129
+ raise AuthenticationError.new("Unauthorized (401): check API key", status: status, body: body)
130
+ end
131
+
132
+ def handle_client_error(body, status)
133
+ raise ClientError.new(error_message(status, body), status: status, body: body)
134
+ end
135
+
136
+ def handle_server_error(body, status)
137
+ raise ServerError.new(error_message(status, body), status: status, body: body)
138
+ end
139
+
140
+ def execute_request(method, path, body:, params:)
141
+ connection.public_send(method, path) do |req|
142
+ apply_headers(req)
143
+ apply_body(req, body) if body
144
+ req.params.update(params) if params
145
+ end
146
+ end
147
+
148
+ def apply_headers(request)
149
+ request.headers["Authorization"] = "Bearer #{api_key}"
150
+ request.headers["Accept"] = "application/json"
151
+ end
152
+
153
+ def apply_body(request, body)
154
+ if multipart_body?(body)
155
+ request.body = body
156
+ else
157
+ request.headers["Content-Type"] = "application/json"
158
+ request.body = JSON.generate(body)
159
+ end
160
+ end
161
+
162
+ def multipart_body?(body)
163
+ return false unless body
164
+ return true if file_part?(body)
165
+
166
+ body.is_a?(Hash) && body.values.any? { |v| file_part?(v) }
167
+ end
168
+
169
+ def file_part?(value)
170
+ (defined?(Faraday::UploadIO) && value.is_a?(Faraday::UploadIO)) ||
171
+ (defined?(Faraday::Multipart::FilePart) && value.is_a?(Faraday::Multipart::FilePart))
172
+ end
173
+ end
174
+ end
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "logger"
4
+
5
+ module ReductoAI
6
+ class Config
7
+ attr_accessor :api_key, :base_url, :open_timeout, :read_timeout, :raise_exceptions
8
+ attr_writer :logger
9
+
10
+ def initialize
11
+ @api_key = ENV.fetch("REDUCTO_API_KEY", nil)
12
+ @base_url = ENV.fetch("REDUCTO_BASE_URL", "https://platform.reducto.ai")
13
+ @open_timeout = integer_or_default("REDUCTO_OPEN_TIMEOUT", 5)
14
+ @read_timeout = integer_or_default("REDUCTO_READ_TIMEOUT", 30)
15
+ @raise_exceptions = true
16
+ end
17
+
18
+ def logger
19
+ @logger ||= (defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger) || Logger.new($stderr)
20
+ end
21
+
22
+ private
23
+
24
+ def integer_or_default(key, default)
25
+ Integer(ENV.fetch(key, default))
26
+ rescue StandardError
27
+ default
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,9 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ if defined?(Rails)
5
+ class Engine < ::Rails::Engine
6
+ isolate_namespace ReductoAI
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ class Error < StandardError
5
+ attr_reader :status, :body
6
+
7
+ def initialize(message = nil, status: nil, body: nil)
8
+ super(message)
9
+ @status = status
10
+ @body = body
11
+ end
12
+ end
13
+
14
+ class AuthenticationError < Error; end
15
+ class ClientError < Error; end
16
+ class ServerError < Error; end
17
+ class NetworkError < Error; end
18
+ end
@@ -0,0 +1,46 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ module Resources
5
+ class Edit
6
+ def initialize(client)
7
+ @client = client
8
+ end
9
+
10
+ def sync(input:, instructions:, **options)
11
+ raise ArgumentError, "input is required" if input.nil?
12
+ if instructions.nil? || (instructions.respond_to?(:empty?) && instructions.empty?)
13
+ raise ArgumentError, "instructions are required"
14
+ end
15
+
16
+ payload = build_payload(input, instructions, options)
17
+ @client.post("/edit", payload)
18
+ end
19
+
20
+ def async(input:, instructions:, async: nil, **options)
21
+ raise ArgumentError, "input is required" if input.nil?
22
+ if instructions.nil? || (instructions.respond_to?(:empty?) && instructions.empty?)
23
+ raise ArgumentError, "instructions are required"
24
+ end
25
+
26
+ payload = build_payload(input, instructions, options)
27
+ payload[:async] = async unless async.nil?
28
+
29
+ @client.post("/edit_async", payload)
30
+ end
31
+
32
+ private
33
+
34
+ def build_payload(input, instructions, options)
35
+ document_url = normalize_input(input)
36
+ { document_url: document_url, edit_instructions: instructions, **options }.compact
37
+ end
38
+
39
+ def normalize_input(input)
40
+ return input unless input.is_a?(Hash)
41
+
42
+ input[:url] || input["url"] || input
43
+ end
44
+ end
45
+ end
46
+ end
@@ -0,0 +1,55 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ module Resources
5
+ class Extract
6
+ def initialize(client)
7
+ @client = client
8
+ end
9
+
10
+ def sync(input:, instructions:, **options)
11
+ raise ArgumentError, "input is required" if input.nil?
12
+ if instructions.nil? || (instructions.respond_to?(:empty?) && instructions.empty?)
13
+ raise ArgumentError, "instructions are required"
14
+ end
15
+
16
+ payload = build_payload(input, instructions, options)
17
+ @client.post("/extract", payload)
18
+ end
19
+
20
+ def async(input:, instructions:, async: nil, **options)
21
+ raise ArgumentError, "input is required" if input.nil?
22
+ if instructions.nil? || (instructions.respond_to?(:empty?) && instructions.empty?)
23
+ raise ArgumentError, "instructions are required"
24
+ end
25
+
26
+ payload = build_payload(input, instructions, options)
27
+ payload[:async] = async unless async.nil?
28
+
29
+ @client.post("/extract_async", payload)
30
+ end
31
+
32
+ private
33
+
34
+ def build_payload(input, instructions, options)
35
+ normalized_input = normalize_input(input)
36
+ normalized_instructions = normalize_instructions(instructions)
37
+
38
+ { input: normalized_input, instructions: normalized_instructions, **options }.compact
39
+ end
40
+
41
+ def normalize_input(input)
42
+ return input unless input.is_a?(Hash)
43
+
44
+ input[:url] || input["url"] || input
45
+ end
46
+
47
+ def normalize_instructions(instructions)
48
+ return { schema: instructions } unless instructions.is_a?(Hash)
49
+ return instructions if instructions.key?(:schema) || instructions.key?("schema")
50
+
51
+ { schema: instructions }
52
+ end
53
+ end
54
+ end
55
+ end
@@ -0,0 +1,65 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ module Resources
5
+ class Jobs
6
+ def initialize(client)
7
+ @client = client
8
+ end
9
+
10
+ def version
11
+ @client.request(:get, "/version")
12
+ end
13
+
14
+ def list(**options)
15
+ params = options.compact
16
+ @client.request(:get, "/jobs", params: params)
17
+ end
18
+
19
+ def cancel(job_id:)
20
+ raise ArgumentError, "job_id is required" if job_id.nil? || job_id.to_s.strip.empty?
21
+
22
+ @client.request(:post, "/cancel/#{job_id}")
23
+ end
24
+
25
+ def retrieve(job_id:)
26
+ raise ArgumentError, "job_id is required" if job_id.nil? || job_id.to_s.strip.empty?
27
+
28
+ @client.request(:get, "/job/#{job_id}")
29
+ end
30
+
31
+ def upload(file:, extension: nil)
32
+ raise ArgumentError, "file is required" if file.nil?
33
+
34
+ upload_io = build_upload_io(file)
35
+ body = { file: upload_io }
36
+ params = {}
37
+ params[:extension] = extension unless extension.nil?
38
+
39
+ @client.request(:post, "/upload", body: body, params: params)
40
+ end
41
+
42
+ def configure_webhook
43
+ @client.request(:post, "/configure_webhook")
44
+ end
45
+
46
+ private
47
+
48
+ def build_upload_io(file)
49
+ if file.is_a?(String)
50
+ raise ArgumentError, "file path does not exist" unless File.exist?(file)
51
+
52
+ filename = File.basename(file)
53
+ else
54
+ filename = if file.respond_to?(:path) && file.path
55
+ File.basename(file.path)
56
+ else
57
+ "upload"
58
+ end
59
+
60
+ end
61
+ Faraday::UploadIO.new(file, "application/octet-stream", filename)
62
+ end
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ module Resources
5
+ class Parse
6
+ def initialize(client)
7
+ @client = client
8
+ end
9
+
10
+ def sync(input:, **options)
11
+ raise ArgumentError, "input is required" if input.nil?
12
+
13
+ normalized_input = normalize_input(input)
14
+ payload = { input: normalized_input, **options }.compact
15
+ @client.post("/parse", payload)
16
+ end
17
+
18
+ def async(input:, async: nil, **options)
19
+ raise ArgumentError, "input is required" if input.nil?
20
+
21
+ normalized_input = normalize_input(input)
22
+ payload = { input: normalized_input }
23
+ payload[:async] = async unless async.nil?
24
+ payload.merge!(options.compact)
25
+
26
+ @client.post("/parse_async", payload)
27
+ end
28
+
29
+ private
30
+
31
+ def normalize_input(input)
32
+ return input unless input.is_a?(Hash)
33
+
34
+ input[:url] || input["url"] || input
35
+ end
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ module Resources
5
+ class Pipeline
6
+ def initialize(client)
7
+ @client = client
8
+ end
9
+
10
+ def sync(input:, steps:, **options)
11
+ raise ArgumentError, "input is required" if input.nil?
12
+ raise ArgumentError, "steps are required" if steps.nil? || (steps.respond_to?(:empty?) && steps.empty?)
13
+
14
+ payload = { input: input, steps: steps, **options }.compact
15
+ @client.post("/pipeline", payload)
16
+ end
17
+
18
+ def async(input:, steps:, async: nil, **options)
19
+ raise ArgumentError, "input is required" if input.nil?
20
+ raise ArgumentError, "steps are required" if steps.nil? || (steps.respond_to?(:empty?) && steps.empty?)
21
+
22
+ payload = { input: input, steps: steps }
23
+ payload[:async] = async unless async.nil?
24
+ payload.merge!(options.compact)
25
+
26
+ @client.post("/pipeline_async", payload)
27
+ end
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ module Resources
5
+ class Split
6
+ def initialize(client)
7
+ @client = client
8
+ end
9
+
10
+ def sync(input:, **options)
11
+ raise ArgumentError, "input is required" if input.nil?
12
+
13
+ normalized_input = normalize_input(input)
14
+ payload = { input: normalized_input, **options }.compact
15
+ @client.post("/split", payload)
16
+ end
17
+
18
+ def async(input:, async: nil, **options)
19
+ raise ArgumentError, "input is required" if input.nil?
20
+
21
+ normalized_input = normalize_input(input)
22
+ payload = { input: normalized_input }
23
+ payload[:async] = async unless async.nil?
24
+ payload.merge!(options.compact)
25
+
26
+ @client.post("/split_async", payload)
27
+ end
28
+
29
+ private
30
+
31
+ def normalize_input(input)
32
+ return input unless input.is_a?(Hash)
33
+
34
+ input[:url] || input["url"] || input
35
+ end
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ReductoAI
4
+ VERSION = "0.1.0"
5
+ end
data/lib/reducto_ai.rb ADDED
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "reducto_ai/version"
4
+ require_relative "reducto_ai/config"
5
+ require_relative "reducto_ai/errors"
6
+ require_relative "reducto_ai/client"
7
+ require_relative "reducto_ai/engine"
8
+
9
+ module ReductoAI
10
+ class << self
11
+ def config
12
+ @config ||= Config.new
13
+ end
14
+
15
+ def configure
16
+ yield(config)
17
+ end
18
+
19
+ def reset_configuration!
20
+ @config = nil
21
+ end
22
+ end
23
+ end
24
+
25
+ # Provide a compatibility alias without requiring Rails inflector acronym config
26
+ ReductoAi = ReductoAI unless defined?(ReductoAi)
@@ -0,0 +1,4 @@
1
+ module ReductoAI
2
+ VERSION: String
3
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
4
+ end
metadata ADDED
@@ -0,0 +1,91 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: reducto_ai
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - dpaluy
8
+ bindir: exe
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: faraday
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '2.9'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - "~>"
24
+ - !ruby/object:Gem::Version
25
+ version: '2.9'
26
+ - !ruby/object:Gem::Dependency
27
+ name: faraday-multipart
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '1.0'
33
+ type: :runtime
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: '1.0'
40
+ description: ReductoAI provides a lightweight Faraday-based wrapper for Reducto's
41
+ Parse, Split, Extract, Edit, and Pipeline endpoints including async helpers and
42
+ Rails-friendly configuration.
43
+ email:
44
+ - dpaluy@users.noreply.github.com
45
+ executables: []
46
+ extensions: []
47
+ extra_rdoc_files: []
48
+ files:
49
+ - CHANGELOG.md
50
+ - LICENSE.txt
51
+ - README.md
52
+ - Rakefile
53
+ - lib/reducto_ai.rb
54
+ - lib/reducto_ai/client.rb
55
+ - lib/reducto_ai/config.rb
56
+ - lib/reducto_ai/engine.rb
57
+ - lib/reducto_ai/errors.rb
58
+ - lib/reducto_ai/resources/edit.rb
59
+ - lib/reducto_ai/resources/extract.rb
60
+ - lib/reducto_ai/resources/jobs.rb
61
+ - lib/reducto_ai/resources/parse.rb
62
+ - lib/reducto_ai/resources/pipeline.rb
63
+ - lib/reducto_ai/resources/split.rb
64
+ - lib/reducto_ai/version.rb
65
+ - sig/reducto_ai.rbs
66
+ homepage: https://github.com/dpaluy/reducto_ai
67
+ licenses:
68
+ - MIT
69
+ metadata:
70
+ rubygems_mfa_required: 'true'
71
+ homepage_uri: https://github.com/dpaluy/reducto_ai
72
+ source_code_uri: https://github.com/dpaluy/reducto_ai
73
+ changelog_uri: https://github.com/dpaluy/reducto_ai/blob/main/CHANGELOG.md
74
+ rdoc_options: []
75
+ require_paths:
76
+ - lib
77
+ required_ruby_version: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ version: 3.2.0
82
+ required_rubygems_version: !ruby/object:Gem::Requirement
83
+ requirements:
84
+ - - ">="
85
+ - !ruby/object:Gem::Version
86
+ version: '0'
87
+ requirements: []
88
+ rubygems_version: 3.7.2
89
+ specification_version: 4
90
+ summary: Ruby client for the Reducto document intelligence API.
91
+ test_files: []