RubyGems - carbon_ruby_sdk - Versions diffs - 0.2.24 → 0.2.25 - Mend

carbon_ruby_sdk 0.2.24 → 0.2.25

Files changed (9) hide show

checksums.yaml +4 -4
data/Gemfile.lock +2 -2
data/README.md +6 -6
data/lib/carbon_ruby_sdk/api/files_api.rb +8 -8
data/lib/carbon_ruby_sdk/api/integrations_api.rb +2 -2
data/lib/carbon_ruby_sdk/models/o_auth_url_request.rb +1 -1
data/lib/carbon_ruby_sdk/version.rb +1 -1
data/spec/api/files_api_spec.rb +2 -2
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4b61e2385085bb9887386f845b210ba35f30fcb7c4e318827914fdc2d4311a8d
-  data.tar.gz: c939329f0ef41fcbdc4a10302d6c587665078d563d969660bae0d5af05e9514d
+  metadata.gz: 744d7329a0dd05a7632f4b4738194bea0627fb74b8fe2f9f35f9bc18c073f885
+  data.tar.gz: 966a7e4c665bc51a25afa98e53ef4d162eaed77a09e0d5a8d95e269fe53e8aa9
 SHA512:
-  metadata.gz: e109bbf4cdf4b20423a509411a212b0e1b9ce246573a56cb24776f04642dae068ca3710c2748f798ddcc8aa976a18ebb92250ce4afe797efdd31efb128de6926
-  data.tar.gz: d98be3101307c2805cbeb8a8b26454f57d32fa0b40cb580902fb23c58ee0c4973e332081d1ad499ea1f8d0858545784f8c2ab98d87a45fd2500c902f0164e800
+  metadata.gz: 436e1932ef9c0933cdefa4646123806d5a342c29700bf9a01c4104e1d31e1e9511400e63c436137361c80943fc8d4fe3c3d87a410c06c10dc6b7c6bb0d54e790
+  data.tar.gz: 1b4199c6ae7fc7f2ea6761aa308614672e1871b8f81f75da2ac7ac87e405260649c0fcfa0edc4011ffde778a83762d990251bc4b8662accf8dc1af9948a32bea

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    carbon_ruby_sdk (0.2.24)
+    carbon_ruby_sdk (0.2.25)
       faraday (>= 1.0.1, < 3.0)
       faraday-multipart (~> 1.0, >= 1.0.4)
@@ -52,7 +52,7 @@ GEM
       rspec-mocks (~> 3.13.0)
     rspec-core (3.13.0)
       rspec-support (~> 3.13.0)
-    rspec-expectations (3.13.1)
+    rspec-expectations (3.13.2)
       diff-lcs (>= 1.2.0, < 2.0)
       rspec-support (~> 3.13.0)
     rspec-mocks (3.13.1)

data/README.md CHANGED Viewed

@@ -6,7 +6,7 @@
 Connect external data to LLMs, no matter the source.
-[![npm](https://img.shields.io/badge/gem-v0.2.24-blue)](https://rubygems.org/gems/carbon_ruby_sdk/versions/0.2.24)
+[![npm](https://img.shields.io/badge/gem-v0.2.25-blue)](https://rubygems.org/gems/carbon_ruby_sdk/versions/0.2.25)
 </div>
@@ -93,7 +93,7 @@ Connect external data to LLMs, no matter the source.
 Add to Gemfile:
 ```ruby
-gem 'carbon_ruby_sdk', '~> 0.2.24'
+gem 'carbon_ruby_sdk', '~> 0.2.25'
 ```
 ## Getting Started<a id="getting-started"></a>
@@ -1114,7 +1114,7 @@ of all possible query parameters:
 - `skip_embedding_generation`: whether or not to skip the generation of chunks and embeddings
 - `set_page_as_boundary`: described above
 - `embedding_model`: the model used to generate embeddings for the document chunks
-- `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently)
+- `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs
 - `generate_sparse_vectors`: whether or not to generate sparse vectors for the file. Required for hybrid search.
 - `prepend_filename_to_chunks`: whether or not to prepend the filename to the chunk text
@@ -1178,8 +1178,8 @@ description route description for more information.
 Embedding model that will be used to embed file chunks.
 ##### use_ocr: `Boolean`<a id="use_ocr-boolean"></a>
-Whether or not to use OCR when processing files. Only valid for PDFs. Useful for
-documents with tables, images, and/or scanned text.
+Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and
+PNGs. Useful for documents with tables, images, and/or scanned text.
 ##### generate_sparse_vectors: `Boolean`<a id="generate_sparse_vectors-boolean"></a>
 Whether or not to generate sparse vectors for the file. This is *required* for
@@ -1695,7 +1695,7 @@ This request id will be added to all files that get synced using the generated
 OAuth URL
 ##### use_ocr: `Boolean`<a id="use_ocr-boolean"></a>
-Enable OCR for files that support it. Supported formats: png, jpg, pdf
+Enable OCR for files that support it. Supported formats: pdf, jpg, png
 ##### parse_pdf_tables_with_ocr: `Boolean`<a id="parse_pdf_tables_with_ocr-boolean"></a>
 ##### enable_file_picker: `Boolean`<a id="enable_file_picker-boolean"></a>

data/lib/carbon_ruby_sdk/api/files_api.rb CHANGED Viewed

@@ -1341,7 +1341,7 @@ module Carbon
     # - `skip_embedding_generation`: whether or not to skip the generation of chunks and embeddings
     # - `set_page_as_boundary`: described above
     # - `embedding_model`: the model used to generate embeddings for the document chunks
-    # - `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently)
+    # - `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs
     # - `generate_sparse_vectors`: whether or not to generate sparse vectors for the file. Required for hybrid search.
     # - `prepend_filename_to_chunks`: whether or not to prepend the filename to the chunk text
     #
@@ -1363,7 +1363,7 @@ module Carbon
     # @param skip_embedding_generation [Boolean] Flag to control whether or not embeddings should be generated and stored when processing file.
     # @param set_page_as_boundary [Boolean] Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information.
     # @param embedding_model [EmbeddingModel] Embedding model that will be used to embed file chunks.
-    # @param use_ocr [Boolean] Whether or not to use OCR when processing files. Only valid for PDFs. Useful for documents with tables, images, and/or scanned text.
+    # @param use_ocr [Boolean] Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with tables, images, and/or scanned text.
     # @param generate_sparse_vectors [Boolean] Whether or not to generate sparse vectors for the file. This is *required* for the file to be a candidate for hybrid search.
     # @param prepend_filename_to_chunks [Boolean] Whether or not to prepend the file's name to chunks.
     # @param max_items_per_chunk [Integer] Number of objects per chunk. For csv, tsv, xlsx, and json files only.
@@ -1414,7 +1414,7 @@ module Carbon
     # - `skip_embedding_generation`: whether or not to skip the generation of chunks and embeddings
     # - `set_page_as_boundary`: described above
     # - `embedding_model`: the model used to generate embeddings for the document chunks
-    # - `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently)
+    # - `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs
     # - `generate_sparse_vectors`: whether or not to generate sparse vectors for the file. Required for hybrid search.
     # - `prepend_filename_to_chunks`: whether or not to prepend the filename to the chunk text
     #
@@ -1436,7 +1436,7 @@ module Carbon
     # @param skip_embedding_generation [Boolean] Flag to control whether or not embeddings should be generated and stored when processing file.
     # @param set_page_as_boundary [Boolean] Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information.
     # @param embedding_model [EmbeddingModel] Embedding model that will be used to embed file chunks.
-    # @param use_ocr [Boolean] Whether or not to use OCR when processing files. Only valid for PDFs. Useful for documents with tables, images, and/or scanned text.
+    # @param use_ocr [Boolean] Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with tables, images, and/or scanned text.
     # @param generate_sparse_vectors [Boolean] Whether or not to generate sparse vectors for the file. This is *required* for the file to be a candidate for hybrid search.
     # @param prepend_filename_to_chunks [Boolean] Whether or not to prepend the file's name to chunks.
     # @param max_items_per_chunk [Integer] Number of objects per chunk. For csv, tsv, xlsx, and json files only.
@@ -1475,7 +1475,7 @@ module Carbon
     end
     # Create Upload File
-    # This endpoint is used to directly upload local files to Carbon. The `POST` request should be a multipart form request. Note that the `set_page_as_boundary` query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters: - `chunk_size`: the chunk size (in tokens) applied when splitting the document - `chunk_overlap`: the chunk overlap (in tokens) applied when splitting the document - `skip_embedding_generation`: whether or not to skip the generation of chunks and embeddings - `set_page_as_boundary`: described above - `embedding_model`: the model used to generate embeddings for the document chunks - `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently) - `generate_sparse_vectors`: whether or not to generate sparse vectors for the file. Required for hybrid search. - `prepend_filename_to_chunks`: whether or not to prepend the filename to the chunk text   Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's `text-embedding-ada-002` and Cohere's embed-multilingual-v3.0. The model can be specified via the `embedding_model` parameter (in the POST body for `/embeddings`, and a query  parameter in `/uploadfile`). If no model is supplied, the `text-embedding-ada-002` is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with `OPENAI`, and files C and D have embeddings generated with `COHERE_MULTILINGUAL_V3`, then by default, queries will only consider files A and B. If `COHERE_MULTILINGUAL_V3` is specified as the `embedding_model` in `/embeddings`, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set `VERTEX_MULTIMODAL` as an `embedding_model`. This model is used automatically by Carbon when it detects an image file.
+    # This endpoint is used to directly upload local files to Carbon. The `POST` request should be a multipart form request. Note that the `set_page_as_boundary` query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters: - `chunk_size`: the chunk size (in tokens) applied when splitting the document - `chunk_overlap`: the chunk overlap (in tokens) applied when splitting the document - `skip_embedding_generation`: whether or not to skip the generation of chunks and embeddings - `set_page_as_boundary`: described above - `embedding_model`: the model used to generate embeddings for the document chunks - `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs - `generate_sparse_vectors`: whether or not to generate sparse vectors for the file. Required for hybrid search. - `prepend_filename_to_chunks`: whether or not to prepend the filename to the chunk text   Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's `text-embedding-ada-002` and Cohere's embed-multilingual-v3.0. The model can be specified via the `embedding_model` parameter (in the POST body for `/embeddings`, and a query  parameter in `/uploadfile`). If no model is supplied, the `text-embedding-ada-002` is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with `OPENAI`, and files C and D have embeddings generated with `COHERE_MULTILINGUAL_V3`, then by default, queries will only consider files A and B. If `COHERE_MULTILINGUAL_V3` is specified as the `embedding_model` in `/embeddings`, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set `VERTEX_MULTIMODAL` as an `embedding_model`. This model is used automatically by Carbon when it detects an image file.
     # @param file [File]
     # @param body_create_upload_file_uploadfile_post [BodyCreateUploadFileUploadfilePost]
     # @param [Hash] opts the optional parameters
@@ -1484,7 +1484,7 @@ module Carbon
     # @option opts [Boolean] :skip_embedding_generation Flag to control whether or not embeddings should be generated and stored             when processing file. (default to false)
     # @option opts [Boolean] :set_page_as_boundary Flag to control whether or not to set the a page's worth of content as the maximum             amount of content that can appear in a chunk. Only valid for PDFs. See description route description for             more information. (default to false)
     # @option opts [EmbeddingModel] :embedding_model Embedding model that will be used to embed file chunks. (default to 'OPENAI')
-    # @option opts [Boolean] :use_ocr Whether or not to use OCR when processing files. Only valid for PDFs. Useful for documents with             tables, images, and/or scanned text. (default to false)
+    # @option opts [Boolean] :use_ocr Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with             tables, images, and/or scanned text. (default to false)
     # @option opts [Boolean] :generate_sparse_vectors Whether or not to generate sparse vectors for the file. This is *required* for the file to be a             candidate for hybrid search. (default to false)
     # @option opts [Boolean] :prepend_filename_to_chunks Whether or not to prepend the file's name to chunks. (default to false)
     # @option opts [Integer] :max_items_per_chunk Number of objects per chunk. For csv, tsv, xlsx, and json files only.
@@ -1503,7 +1503,7 @@ module Carbon
     end
     # Create Upload File
-    # This endpoint is used to directly upload local files to Carbon. The &#x60;POST&#x60; request should be a multipart form request. Note that the &#x60;set_page_as_boundary&#x60; query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters: - &#x60;chunk_size&#x60;: the chunk size (in tokens) applied when splitting the document - &#x60;chunk_overlap&#x60;: the chunk overlap (in tokens) applied when splitting the document - &#x60;skip_embedding_generation&#x60;: whether or not to skip the generation of chunks and embeddings - &#x60;set_page_as_boundary&#x60;: described above - &#x60;embedding_model&#x60;: the model used to generate embeddings for the document chunks - &#x60;use_ocr&#x60;: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently) - &#x60;generate_sparse_vectors&#x60;: whether or not to generate sparse vectors for the file. Required for hybrid search. - &#x60;prepend_filename_to_chunks&#x60;: whether or not to prepend the filename to the chunk text   Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI&#39;s multimodal model; for text, we support OpenAI&#39;s &#x60;text-embedding-ada-002&#x60; and Cohere&#39;s embed-multilingual-v3.0. The model can be specified via the &#x60;embedding_model&#x60; parameter (in the POST body for &#x60;/embeddings&#x60;, and a query  parameter in &#x60;/uploadfile&#x60;). If no model is supplied, the &#x60;text-embedding-ada-002&#x60; is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with &#x60;OPENAI&#x60;, and files C and D have embeddings generated with &#x60;COHERE_MULTILINGUAL_V3&#x60;, then by default, queries will only consider files A and B. If &#x60;COHERE_MULTILINGUAL_V3&#x60; is specified as the &#x60;embedding_model&#x60; in &#x60;/embeddings&#x60;, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set &#x60;VERTEX_MULTIMODAL&#x60; as an &#x60;embedding_model&#x60;. This model is used automatically by Carbon when it detects an image file.
+    # This endpoint is used to directly upload local files to Carbon. The &#x60;POST&#x60; request should be a multipart form request. Note that the &#x60;set_page_as_boundary&#x60; query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters: - &#x60;chunk_size&#x60;: the chunk size (in tokens) applied when splitting the document - &#x60;chunk_overlap&#x60;: the chunk overlap (in tokens) applied when splitting the document - &#x60;skip_embedding_generation&#x60;: whether or not to skip the generation of chunks and embeddings - &#x60;set_page_as_boundary&#x60;: described above - &#x60;embedding_model&#x60;: the model used to generate embeddings for the document chunks - &#x60;use_ocr&#x60;: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs - &#x60;generate_sparse_vectors&#x60;: whether or not to generate sparse vectors for the file. Required for hybrid search. - &#x60;prepend_filename_to_chunks&#x60;: whether or not to prepend the filename to the chunk text   Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI&#39;s multimodal model; for text, we support OpenAI&#39;s &#x60;text-embedding-ada-002&#x60; and Cohere&#39;s embed-multilingual-v3.0. The model can be specified via the &#x60;embedding_model&#x60; parameter (in the POST body for &#x60;/embeddings&#x60;, and a query  parameter in &#x60;/uploadfile&#x60;). If no model is supplied, the &#x60;text-embedding-ada-002&#x60; is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with &#x60;OPENAI&#x60;, and files C and D have embeddings generated with &#x60;COHERE_MULTILINGUAL_V3&#x60;, then by default, queries will only consider files A and B. If &#x60;COHERE_MULTILINGUAL_V3&#x60; is specified as the &#x60;embedding_model&#x60; in &#x60;/embeddings&#x60;, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set &#x60;VERTEX_MULTIMODAL&#x60; as an &#x60;embedding_model&#x60;. This model is used automatically by Carbon when it detects an image file.
     # @param file [File]
     # @param body_create_upload_file_uploadfile_post [BodyCreateUploadFileUploadfilePost]
     # @param [Hash] opts the optional parameters
@@ -1512,7 +1512,7 @@ module Carbon
     # @option opts [Boolean] :skip_embedding_generation Flag to control whether or not embeddings should be generated and stored             when processing file. (default to false)
     # @option opts [Boolean] :set_page_as_boundary Flag to control whether or not to set the a page's worth of content as the maximum             amount of content that can appear in a chunk. Only valid for PDFs. See description route description for             more information. (default to false)
     # @option opts [EmbeddingModel] :embedding_model Embedding model that will be used to embed file chunks. (default to 'OPENAI')
-    # @option opts [Boolean] :use_ocr Whether or not to use OCR when processing files. Only valid for PDFs. Useful for documents with             tables, images, and/or scanned text. (default to false)
+    # @option opts [Boolean] :use_ocr Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with             tables, images, and/or scanned text. (default to false)
     # @option opts [Boolean] :generate_sparse_vectors Whether or not to generate sparse vectors for the file. This is *required* for the file to be a             candidate for hybrid search. (default to false)
     # @option opts [Boolean] :prepend_filename_to_chunks Whether or not to prepend the file's name to chunks. (default to false)
     # @option opts [Integer] :max_items_per_chunk Number of objects per chunk. For csv, tsv, xlsx, and json files only.

data/lib/carbon_ruby_sdk/api/integrations_api.rb CHANGED Viewed

@@ -661,7 +661,7 @@ module Carbon
     # @param data_source_id [Integer] Used to specify a data source to sync from if you have multiple connected. It can be skipped if you only have one data source of that type connected or are connecting a new account.
     # @param connecting_new_account [Boolean] Used to connect a new data source. If not specified, we will attempt to create a sync URL for an existing data source based on type and ID.
     # @param request_id [String] This request id will be added to all files that get synced using the generated OAuth URL
-    # @param use_ocr [Boolean] Enable OCR for files that support it. Supported formats: png, jpg, pdf
+    # @param use_ocr [Boolean] Enable OCR for files that support it. Supported formats: pdf, jpg, png
     # @param parse_pdf_tables_with_ocr [Boolean]
     # @param enable_file_picker [Boolean] Enable integration's file picker for sources that support it. Supported sources: BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT
     # @param sync_source_items [Boolean] Enabling this flag will fetch all available content from the source to be listed via list items endpoint
@@ -731,7 +731,7 @@ module Carbon
     # @param data_source_id [Integer] Used to specify a data source to sync from if you have multiple connected. It can be skipped if you only have one data source of that type connected or are connecting a new account.
     # @param connecting_new_account [Boolean] Used to connect a new data source. If not specified, we will attempt to create a sync URL for an existing data source based on type and ID.
     # @param request_id [String] This request id will be added to all files that get synced using the generated OAuth URL
-    # @param use_ocr [Boolean] Enable OCR for files that support it. Supported formats: png, jpg, pdf
+    # @param use_ocr [Boolean] Enable OCR for files that support it. Supported formats: pdf, jpg, png
     # @param parse_pdf_tables_with_ocr [Boolean]
     # @param enable_file_picker [Boolean] Enable integration's file picker for sources that support it. Supported sources: BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT
     # @param sync_source_items [Boolean] Enabling this flag will fetch all available content from the source to be listed via list items endpoint

data/lib/carbon_ruby_sdk/models/o_auth_url_request.rb CHANGED Viewed

@@ -56,7 +56,7 @@ module Carbon
     # This request id will be added to all files that get synced using the generated OAuth URL
     attr_accessor :request_id
-    # Enable OCR for files that support it. Supported formats: png, jpg, pdf
+    # Enable OCR for files that support it. Supported formats: pdf, jpg, png
     attr_accessor :use_ocr
     attr_accessor :parse_pdf_tables_with_ocr

data/lib/carbon_ruby_sdk/version.rb CHANGED Viewed

@@ -7,5 +7,5 @@ The version of the OpenAPI document: 1.0.0
 =end
 module Carbon
-  VERSION = '0.2.24'
+  VERSION = '0.2.25'
 end

data/spec/api/files_api_spec.rb CHANGED Viewed

@@ -165,7 +165,7 @@ describe 'FilesApi' do
   # unit tests for upload
   # Create Upload File
-  # This endpoint is used to directly upload local files to Carbon. The &#x60;POST&#x60; request should be a multipart form request. Note that the &#x60;set_page_as_boundary&#x60; query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters: - &#x60;chunk_size&#x60;: the chunk size (in tokens) applied when splitting the document - &#x60;chunk_overlap&#x60;: the chunk overlap (in tokens) applied when splitting the document - &#x60;skip_embedding_generation&#x60;: whether or not to skip the generation of chunks and embeddings - &#x60;set_page_as_boundary&#x60;: described above - &#x60;embedding_model&#x60;: the model used to generate embeddings for the document chunks - &#x60;use_ocr&#x60;: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently) - &#x60;generate_sparse_vectors&#x60;: whether or not to generate sparse vectors for the file. Required for hybrid search. - &#x60;prepend_filename_to_chunks&#x60;: whether or not to prepend the filename to the chunk text   Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI&#39;s multimodal model; for text, we support OpenAI&#39;s &#x60;text-embedding-ada-002&#x60; and Cohere&#39;s embed-multilingual-v3.0. The model can be specified via the &#x60;embedding_model&#x60; parameter (in the POST body for &#x60;/embeddings&#x60;, and a query  parameter in &#x60;/uploadfile&#x60;). If no model is supplied, the &#x60;text-embedding-ada-002&#x60; is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with &#x60;OPENAI&#x60;, and files C and D have embeddings generated with &#x60;COHERE_MULTILINGUAL_V3&#x60;, then by default, queries will only consider files A and B. If &#x60;COHERE_MULTILINGUAL_V3&#x60; is specified as the &#x60;embedding_model&#x60; in &#x60;/embeddings&#x60;, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set &#x60;VERTEX_MULTIMODAL&#x60; as an &#x60;embedding_model&#x60;. This model is used automatically by Carbon when it detects an image file.
+  # This endpoint is used to directly upload local files to Carbon. The &#x60;POST&#x60; request should be a multipart form request. Note that the &#x60;set_page_as_boundary&#x60; query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters: - &#x60;chunk_size&#x60;: the chunk size (in tokens) applied when splitting the document - &#x60;chunk_overlap&#x60;: the chunk overlap (in tokens) applied when splitting the document - &#x60;skip_embedding_generation&#x60;: whether or not to skip the generation of chunks and embeddings - &#x60;set_page_as_boundary&#x60;: described above - &#x60;embedding_model&#x60;: the model used to generate embeddings for the document chunks - &#x60;use_ocr&#x60;: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs - &#x60;generate_sparse_vectors&#x60;: whether or not to generate sparse vectors for the file. Required for hybrid search. - &#x60;prepend_filename_to_chunks&#x60;: whether or not to prepend the filename to the chunk text   Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI&#39;s multimodal model; for text, we support OpenAI&#39;s &#x60;text-embedding-ada-002&#x60; and Cohere&#39;s embed-multilingual-v3.0. The model can be specified via the &#x60;embedding_model&#x60; parameter (in the POST body for &#x60;/embeddings&#x60;, and a query  parameter in &#x60;/uploadfile&#x60;). If no model is supplied, the &#x60;text-embedding-ada-002&#x60; is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with &#x60;OPENAI&#x60;, and files C and D have embeddings generated with &#x60;COHERE_MULTILINGUAL_V3&#x60;, then by default, queries will only consider files A and B. If &#x60;COHERE_MULTILINGUAL_V3&#x60; is specified as the &#x60;embedding_model&#x60; in &#x60;/embeddings&#x60;, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set &#x60;VERTEX_MULTIMODAL&#x60; as an &#x60;embedding_model&#x60;. This model is used automatically by Carbon when it detects an image file.
   # @param file
   # @param body_create_upload_file_uploadfile_post
   # @param [Hash] opts the optional parameters
@@ -174,7 +174,7 @@ describe 'FilesApi' do
   # @option opts [Boolean] :skip_embedding_generation Flag to control whether or not embeddings should be generated and stored             when processing file.
   # @option opts [Boolean] :set_page_as_boundary Flag to control whether or not to set the a page&#39;s worth of content as the maximum             amount of content that can appear in a chunk. Only valid for PDFs. See description route description for             more information.
   # @option opts [EmbeddingModel] :embedding_model Embedding model that will be used to embed file chunks.
-  # @option opts [Boolean] :use_ocr Whether or not to use OCR when processing files. Only valid for PDFs. Useful for documents with             tables, images, and/or scanned text.
+  # @option opts [Boolean] :use_ocr Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with             tables, images, and/or scanned text.
   # @option opts [Boolean] :generate_sparse_vectors Whether or not to generate sparse vectors for the file. This is *required* for the file to be a             candidate for hybrid search.
   # @option opts [Boolean] :prepend_filename_to_chunks Whether or not to prepend the file&#39;s name to chunks.
   # @option opts [Integer] :max_items_per_chunk Number of objects per chunk. For csv, tsv, xlsx, and json files only.

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: carbon_ruby_sdk
 version: !ruby/object:Gem::Version
-  version: 0.2.24
+  version: 0.2.25
 platform: ruby
 authors:
 - Konfig
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-08-20 00:00:00.000000000 Z
+date: 2024-08-21 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: faraday