scanii-ruby 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0e3c62870b928231999c133d889594f002682b74b34769c5741dbfe9a9c12693
4
- data.tar.gz: a1673812fa7c2e46fa12657a3898fb01fdf56399c04146e95dfdf048d12ef185
3
+ metadata.gz: '0418285adf3f56f3389b379b9eedadd5b281591cd2c7e62ad05092a8e8b22595'
4
+ data.tar.gz: a61ce21d24bc78f557e09394868fc0c50b9fb2777b8649dce59b017a7d6a427c
5
5
  SHA512:
6
- metadata.gz: 1afa93f5a49873f5813dbf9701d5886b173a3e3d5dbdda9d214690308eb77d92b8ff96bde66b6fa78b2f455af32d317bf8e1ce1338824fa0a5801f2dbe4786cd
7
- data.tar.gz: 6ed39afbb9846cec5916f8a77c8119a717f6e5c6e6da81dd17756a456fa069735063a20762a772090400d09b1e4fd85c883a92176beecf36479fe10d38ae7bd4
6
+ metadata.gz: 504165ee4f450fa8266675e1eb9d43d54a44e76547a5a8fe870865b4123c1563a4d9bf693df893855ecef9197288beb0e5da3425bf3a4524aa76cc1d02bbec25
7
+ data.tar.gz: 897b47ed149ea571105dde92462e61d69f43fd496f20a50200dc64d0bce2e2ab7b52ade2837de7d5d6ac5a0f25f5de93f98f0d0ad5781f474b076d8c76c874e0
data/CHANGELOG.md CHANGED
@@ -2,6 +2,36 @@
2
2
 
3
3
  All notable changes to `scanii-ruby` are documented here. Versions follow [SemVer](https://semver.org).
4
4
 
5
+ ## 1.1.0 — Streaming standardization
6
+
7
+ Adds stream-based `process` and `process_async` methods, aligning scanii-ruby with the
8
+ cross-SDK streaming standard. File content is now truly streamed to the socket via
9
+ `Net::HTTP#body_stream=` rather than buffered into a single String.
10
+
11
+ ### New API
12
+
13
+ - `Scanii::Client#process(io, filename:, content_type: nil, metadata: nil, callback: nil)` →
14
+ `Scanii::ProcessingResult` — accepts any IO-like object (anything responding to `read(n)`).
15
+ Both `File` (opened with `File.open(path, "rb")`) and `StringIO` work.
16
+ - `Scanii::Client#process_file(path, metadata: nil, callback: nil)` →
17
+ `Scanii::ProcessingResult` — convenience wrapper that opens the file in binary mode and
18
+ delegates to `process`. This is the replacement for the old `process(path, ...)` form.
19
+ - Same shapes for `process_async` / `process_async_file`.
20
+
21
+ ### Deprecations
22
+
23
+ - `process(path_string, ...)` — passing a String path to `process` is deprecated; use
24
+ `process_file(path)` instead. The old form still works and emits a runtime `warn`. Will be
25
+ removed in a future major version.
26
+ - `process_async(path_string, ...)` — same; use `process_async_file(path)`. Will be removed
27
+ in a future major version.
28
+
29
+ ### Internals
30
+
31
+ - `Scanii::Multipart.stream_encode` replaces the old `encode`. Returns a `[ChainedIO,
32
+ content_type, content_length]` triple. `ChainedIO` reads prologue → caller IO → epilogue
33
+ without ever buffering the full body.
34
+
5
35
  ## 1.0.1 — Release infrastructure fix
6
36
 
7
37
  Wires up `bundler/gem_tasks` in the Rakefile so `bundle exec rake release` (invoked by `rubygems/release-gem@v1`) resolves correctly. v1.0.0 was tagged but never published to RubyGems because the release workflow failed at the `rake release` task lookup; v1.0.1 is functionally identical to that tag. No SDK behavior changes.
data/README.md CHANGED
@@ -30,12 +30,23 @@ Targets Ruby 3.4+. Zero runtime dependencies.
30
30
 
31
31
  ## Quickstart
32
32
 
33
+ Scan a file from disk:
34
+
33
35
  ```ruby
34
36
  require "scanii"
35
37
 
36
38
  client = Scanii::Client.new(key: "your-key", secret: "your-secret")
37
39
 
38
- result = client.process("./file.pdf")
40
+ result = client.process_file("./file.pdf")
41
+ puts "findings: #{result.findings.inspect}"
42
+ ```
43
+
44
+ Scan content already in memory (no temp file needed):
45
+
46
+ ```ruby
47
+ require "stringio"
48
+
49
+ result = client.process(StringIO.new(bytes), filename: "upload.bin")
39
50
  puts "findings: #{result.findings.inspect}"
40
51
  ```
41
52
 
@@ -45,8 +56,10 @@ puts "findings: #{result.findings.inspect}"
45
56
 
46
57
  | Method | REST | Returns |
47
58
  |---|---|---|
48
- | `process(file_path, metadata:, callback:)` | `POST /files` | `Scanii::ProcessingResult` |
49
- | `process_async(file_path, metadata:, callback:)` | `POST /files/async` | `Scanii::PendingResult` |
59
+ | `process(io, filename:, content_type:, metadata:, callback:)` | `POST /files` | `Scanii::ProcessingResult` |
60
+ | `process_file(path, metadata:, callback:)` | `POST /files` | `Scanii::ProcessingResult` |
61
+ | `process_async(io, filename:, content_type:, metadata:, callback:)` | `POST /files/async` | `Scanii::PendingResult` |
62
+ | `process_async_file(path, metadata:, callback:)` | `POST /files/async` | `Scanii::PendingResult` |
50
63
  | `fetch(url, metadata:, callback:)` | `POST /files/fetch` | `Scanii::PendingResult` |
51
64
  | `retrieve(id)` | `GET /files/{id}` | `Scanii::ProcessingResult` |
52
65
  | `ping` | `GET /ping` | `true` |
data/lib/scanii/client.rb CHANGED
@@ -13,10 +13,13 @@ module Scanii
13
13
  #
14
14
  # @see https://scanii.github.io/openapi/v22/
15
15
  #
16
- # @example
16
+ # @example Scan a file from disk
17
17
  # client = Scanii::Client.new(key: "your-key", secret: "your-secret")
18
- # result = client.process("./file.pdf")
18
+ # result = client.process_file("./file.pdf")
19
19
  # puts result.findings # [] when clean
20
+ #
21
+ # @example Scan content already in memory
22
+ # result = client.process(StringIO.new(bytes), filename: "upload.bin")
20
23
  class Client
21
24
  DEFAULT_ENDPOINT = "https://api.scanii.com".freeze
22
25
  DEFAULT_TIMEOUT = 60
@@ -44,34 +47,111 @@ module Scanii
44
47
  @user_agent = user_agent && !user_agent.empty? ? "#{user_agent} #{USER_AGENT}" : USER_AGENT
45
48
  end
46
49
 
47
- # Submit a file for synchronous scanning.
50
+ # Submit an IO-like object for synchronous scanning.
51
+ #
52
+ # +io+ is duck-typed: anything responding to +read(n)+ returning a String.
53
+ # Both +File+ (opened with +File.open(path, "rb")+) and +StringIO+ work.
54
+ # The body is streamed to the socket; file content is never fully buffered.
55
+ #
56
+ # Passing a String path is deprecated — use {#process_file} instead.
57
+ #
58
+ # @overload process(io, filename:, content_type: nil, metadata: nil, callback: nil)
59
+ # @param io [#read] IO-like object
60
+ # @param filename [String] filename sent in the multipart part
61
+ # @param content_type [String, nil] content-type of the file part; guessed from filename when nil
62
+ # @param metadata [Hash{String=>String}, nil] arbitrary key/value pairs attached to the result
63
+ # @param callback [String, nil] URL to POST the result to on completion
48
64
  #
49
65
  # @see https://scanii.github.io/openapi/v22/ POST /files
50
66
  # @return [Scanii::ProcessingResult]
51
- def process(file_path, metadata: nil, callback: nil)
52
- assert_readable(file_path)
67
+ def process(first_arg, filename: nil, content_type: nil, metadata: nil, callback: nil)
68
+ if first_arg.is_a?(String)
69
+ # @deprecated Use {#process_file} instead. Will be removed in a future major version.
70
+ warn "[DEPRECATION] `Scanii::Client#process(path)` is deprecated; " \
71
+ "use `process_file(path)` instead. Will be removed in a future major version."
72
+ return process_file(first_arg, metadata: metadata, callback: callback)
73
+ end
74
+
75
+ raise ArgumentError, "io must respond to read" unless first_arg.respond_to?(:read)
76
+ raise ArgumentError, "filename: is required" if filename.nil? || filename.to_s.empty?
77
+
53
78
  fields = build_text_fields(metadata, callback)
54
- body, content_type = Multipart.encode(fields, file_path)
55
- status, resp_body, headers = post("/files", body: body, content_type: content_type)
79
+ stream, ct, length = Multipart.stream_encode(fields, first_arg, filename.to_s, content_type)
80
+ status, resp_body, headers = post("/files", body_stream: stream, content_type: ct,
81
+ content_length: length)
56
82
  raise_for_status(status, resp_body, headers) unless status == 201
57
83
  ProcessingResult.from_response(resp_body, headers)
58
84
  end
59
85
 
60
- # Submit a file for server-side asynchronous scanning. Returns a pending
61
- # id; the final result is delivered to +callback+ (when supplied) or
62
- # fetched via #retrieve.
86
+ # Submit a file path for synchronous scanning.
87
+ #
88
+ # Opens the file in binary mode, streams it to Scanii, and closes it.
89
+ # Delegates to {#process} with +filename+ set to the basename.
90
+ #
91
+ # @param file_path [String] path to the file to upload
92
+ # @param metadata [Hash{String=>String}, nil]
93
+ # @param callback [String, nil]
94
+ # @see https://scanii.github.io/openapi/v22/ POST /files
95
+ # @return [Scanii::ProcessingResult]
96
+ def process_file(file_path, metadata: nil, callback: nil)
97
+ assert_readable(file_path)
98
+ File.open(file_path.to_s, "rb") do |f|
99
+ process(f, filename: File.basename(file_path.to_s), metadata: metadata, callback: callback)
100
+ end
101
+ end
102
+
103
+ # Submit an IO-like object for server-side asynchronous scanning.
104
+ #
105
+ # Returns a pending id; the final result is delivered to +callback+ (when
106
+ # supplied) or fetched via {#retrieve}.
107
+ #
108
+ # Passing a String path is deprecated — use {#process_async_file} instead.
109
+ #
110
+ # @overload process_async(io, filename:, content_type: nil, metadata: nil, callback: nil)
111
+ # @param io [#read] IO-like object
112
+ # @param filename [String] filename sent in the multipart part
113
+ # @param content_type [String, nil]
114
+ # @param metadata [Hash{String=>String}, nil]
115
+ # @param callback [String, nil]
63
116
  #
64
117
  # @see https://scanii.github.io/openapi/v22/ POST /files/async
65
118
  # @return [Scanii::PendingResult]
66
- def process_async(file_path, metadata: nil, callback: nil)
67
- assert_readable(file_path)
119
+ def process_async(first_arg, filename: nil, content_type: nil, metadata: nil, callback: nil)
120
+ if first_arg.is_a?(String)
121
+ # @deprecated Use {#process_async_file} instead. Will be removed in a future major version.
122
+ warn "[DEPRECATION] `Scanii::Client#process_async(path)` is deprecated; " \
123
+ "use `process_async_file(path)` instead. Will be removed in a future major version."
124
+ return process_async_file(first_arg, metadata: metadata, callback: callback)
125
+ end
126
+
127
+ raise ArgumentError, "io must respond to read" unless first_arg.respond_to?(:read)
128
+ raise ArgumentError, "filename: is required" if filename.nil? || filename.to_s.empty?
129
+
68
130
  fields = build_text_fields(metadata, callback)
69
- body, content_type = Multipart.encode(fields, file_path)
70
- status, resp_body, headers = post("/files/async", body: body, content_type: content_type)
131
+ stream, ct, length = Multipart.stream_encode(fields, first_arg, filename.to_s, content_type)
132
+ status, resp_body, headers = post("/files/async", body_stream: stream, content_type: ct,
133
+ content_length: length)
71
134
  raise_for_status(status, resp_body, headers) unless status == 202
72
135
  PendingResult.from_response(resp_body, headers)
73
136
  end
74
137
 
138
+ # Submit a file path for server-side asynchronous scanning.
139
+ #
140
+ # Opens the file in binary mode and delegates to {#process_async}.
141
+ #
142
+ # @param file_path [String] path to the file to upload
143
+ # @param metadata [Hash{String=>String}, nil]
144
+ # @param callback [String, nil]
145
+ # @see https://scanii.github.io/openapi/v22/ POST /files/async
146
+ # @return [Scanii::PendingResult]
147
+ def process_async_file(file_path, metadata: nil, callback: nil)
148
+ assert_readable(file_path)
149
+ File.open(file_path.to_s, "rb") do |f|
150
+ process_async(f, filename: File.basename(file_path.to_s), metadata: metadata,
151
+ callback: callback)
152
+ end
153
+ end
154
+
75
155
  # Ask Scanii to download a remote URL and scan it asynchronously.
76
156
  #
77
157
  # @see https://scanii.github.io/openapi/v22/ POST /files/fetch
@@ -190,14 +270,16 @@ module Scanii
190
270
  fields
191
271
  end
192
272
 
193
- def post(path, body:, content_type:)
194
- request("POST", path, body: body, content_type: content_type)
273
+ def post(path, body: nil, content_type: nil, body_stream: nil, content_length: nil)
274
+ request("POST", path, body: body, content_type: content_type,
275
+ body_stream: body_stream, content_length: content_length)
195
276
  end
196
277
 
197
- def request(method, path, body: nil, content_type: nil)
278
+ def request(method, path, body: nil, content_type: nil, body_stream: nil, content_length: nil)
198
279
  uri = URI.parse("#{@base_uri}#{path}")
199
280
 
200
- req = build_request(method, uri, body, content_type)
281
+ req = build_request(method, uri, body: body, content_type: content_type,
282
+ body_stream: body_stream, content_length: content_length)
201
283
 
202
284
  Net::HTTP.start(uri.hostname, uri.port,
203
285
  use_ssl: uri.scheme == "https",
@@ -211,7 +293,7 @@ module Scanii
211
293
  raise Scanii::Error, "transport error: #{e.class}: #{e.message}"
212
294
  end
213
295
 
214
- def build_request(method, uri, body, content_type)
296
+ def build_request(method, uri, body: nil, content_type: nil, body_stream: nil, content_length: nil)
215
297
  klass = case method
216
298
  when "GET" then Net::HTTP::Get
217
299
  when "POST" then Net::HTTP::Post
@@ -224,7 +306,14 @@ module Scanii
224
306
  req["User-Agent"] = @user_agent
225
307
  req["Accept"] = "application/json"
226
308
  req["Content-Type"] = content_type if content_type
227
- req.body = body if body
309
+
310
+ if body_stream
311
+ req.body_stream = body_stream
312
+ req["Content-Length"] = content_length.to_s
313
+ elsif body
314
+ req.body = body
315
+ end
316
+
228
317
  req
229
318
  end
230
319
 
@@ -1,14 +1,11 @@
1
1
  require "securerandom"
2
+ require "stringio"
2
3
 
3
4
  module Scanii
4
5
  # Hand-rolled multipart/form-data encoder (RFC 7578).
5
6
  #
6
7
  # Ruby's stdlib Net::HTTP does not bundle a multipart encoder; this is the
7
8
  # smallest viable implementation that covers the Scanii POST /files payload.
8
- #
9
- # Body is assembled as a single binary-encoded String -- file contents are
10
- # read into memory rather than streamed. This matches the PHP SDK's approach;
11
- # callers scanning very large files should be aware.
12
9
  module Multipart
13
10
  module_function
14
11
 
@@ -22,42 +19,45 @@ module Scanii
22
19
  "multipart/form-data; boundary=#{boundary}"
23
20
  end
24
21
 
25
- # Encode a multipart body containing the bytes at file_path plus the given
26
- # text fields.
22
+ # Encode a multipart body as a streaming ChainedIO.
23
+ #
24
+ # Builds the RFC 7578 prologue and epilogue as binary Strings, chains them
25
+ # around the caller's IO, and returns the triple required for
26
+ # Net::HTTP body_stream= uploads. The caller's IO is never read here --
27
+ # only when Net::HTTP reads from the returned ChainedIO.
27
28
  #
28
- # @param fields [Hash{String=>String}] text form fields (e.g. metadata[k] => v, callback => url)
29
- # @param file_path [String] path to the file to upload
29
+ # @param fields [Hash{String=>String}] text form fields (e.g. metadata[k]=v, callback)
30
+ # @param io [#read, #size] IO-like object (anything responding to read(n))
31
+ # @param filename [String] filename for the file part
32
+ # @param content_type [String, nil] content-type of the file part; falls back to extension lookup
30
33
  # @param file_field [String] name of the file form field; default "file"
31
- # @return [Array(String, String)] tuple of [body, content_type]
32
- def encode(fields, file_path, file_field: "file")
34
+ # @return [Array(ChainedIO, String, Integer)] [body_stream, content_type_header, content_length]
35
+ def stream_encode(fields, io, filename, content_type = nil, file_field: "file")
33
36
  boundary = make_boundary
37
+ ct = content_type || guess_content_type(filename)
34
38
 
35
- filename = File.basename(file_path)
36
- content_type = guess_content_type(file_path)
37
- file_bytes = File.binread(file_path)
38
-
39
- body = String.new(encoding: Encoding::BINARY)
40
-
39
+ prologue = String.new(encoding: Encoding::BINARY)
41
40
  fields.each do |name, value|
42
- write_text_part(body, boundary, name.to_s, value.to_s)
41
+ write_text_part(prologue, boundary, name.to_s, value.to_s)
43
42
  end
43
+ prologue << "--#{boundary}\r\n".b
44
+ prologue << "Content-Disposition: form-data; name=\"#{file_field}\"; filename=\"#{filename}\"\r\n".b
45
+ prologue << "Content-Type: #{ct}\r\n\r\n".b
44
46
 
45
- body << "--#{boundary}\r\n".b
46
- body << "Content-Disposition: form-data; name=\"#{file_field}\"; filename=\"#{filename}\"\r\n".b
47
- body << "Content-Type: #{content_type}\r\n\r\n".b
48
- body << file_bytes.b
49
- body << "\r\n".b
50
- body << "--#{boundary}--\r\n".b
47
+ epilogue = "\r\n--#{boundary}--\r\n".b
51
48
 
52
- [body, make_content_type(boundary)]
49
+ io_size = io_remaining_bytes(io)
50
+ total_length = prologue.bytesize + io_size + epilogue.bytesize
51
+
52
+ [ChainedIO.new(prologue, io, epilogue), make_content_type(boundary), total_length]
53
53
  end
54
54
 
55
- # Best-effort content-type lookup by extension. Falls back to
55
+ # Best-effort content-type lookup by filename extension. Falls back to
56
56
  # application/octet-stream. The Scanii API does not require an accurate
57
57
  # content-type on the multipart part -- the server inspects the bytes -- so
58
58
  # a short table is sufficient.
59
- def guess_content_type(path)
60
- ext = File.extname(path).delete_prefix(".").downcase
59
+ def guess_content_type(filename)
60
+ ext = File.extname(filename.to_s).delete_prefix(".").downcase
61
61
  MIME_TYPES.fetch(ext, "application/octet-stream")
62
62
  end
63
63
 
@@ -96,5 +96,60 @@ module Scanii
96
96
  body << "\r\n".b
97
97
  end
98
98
  private_class_method :write_text_part
99
+
100
+ # Return the number of bytes remaining to be read from io.
101
+ # Requires io to respond to size (File and StringIO both do).
102
+ def io_remaining_bytes(io)
103
+ if io.respond_to?(:pos) && io.respond_to?(:size)
104
+ io.size - io.pos
105
+ elsif io.respond_to?(:size)
106
+ io.size
107
+ else
108
+ raise ArgumentError, "io must respond to size (File and StringIO do; got #{io.class})"
109
+ end
110
+ end
111
+ private_class_method :io_remaining_bytes
112
+
113
+ # Reads prologue_str, then io, then epilogue_str in sequence.
114
+ # Used as Net::HTTP body_stream for streaming multipart uploads.
115
+ class ChainedIO
116
+ def initialize(prologue, io, epilogue)
117
+ @parts = [StringIO.new(prologue), io, StringIO.new(epilogue)]
118
+ @idx = 0
119
+ end
120
+
121
+ def read(length = nil, buf = nil)
122
+ result = length.nil? ? read_all : read_n(length)
123
+ return nil if result.nil?
124
+
125
+ buf ? buf.replace(result) : result
126
+ end
127
+
128
+ private
129
+
130
+ def read_all
131
+ result = String.new(encoding: Encoding::BINARY)
132
+ @parts[@idx..].each do |part|
133
+ chunk = part.read
134
+ result << chunk.b if chunk
135
+ end
136
+ @idx = @parts.size
137
+ result
138
+ end
139
+
140
+ def read_n(length)
141
+ result = String.new(encoding: Encoding::BINARY)
142
+ while result.bytesize < length && @idx < @parts.size
143
+ chunk = @parts[@idx].read(length - result.bytesize)
144
+ if chunk.nil? || chunk.empty?
145
+ @idx += 1
146
+ else
147
+ result << chunk.b
148
+ end
149
+ end
150
+ result.empty? ? nil : result
151
+ end
152
+ end
153
+ private_constant :ChainedIO
99
154
  end
100
155
  end
@@ -1,3 +1,3 @@
1
1
  module Scanii
2
- VERSION = "1.0.1".freeze
2
+ VERSION = "1.1.0".freeze
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: scanii-ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.1
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Scanii