RubyGems - informers - Versions diffs - 1.0.2 → 1.1.0 - Mend

informers 1.0.2 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +9 -0
data/README.md +213 -19
data/lib/informers/configs.rb +10 -8
data/lib/informers/model.rb +2 -14
data/lib/informers/models.rb +1027 -13
data/lib/informers/pipelines.rb +781 -14
data/lib/informers/processors.rb +796 -0
data/lib/informers/tokenizers.rb +166 -4
data/lib/informers/utils/core.rb +4 -0
data/lib/informers/utils/generation.rb +294 -0
data/lib/informers/utils/image.rb +116 -0
data/lib/informers/utils/math.rb +73 -0
data/lib/informers/utils/tensor.rb +46 -0
data/lib/informers/version.rb +1 -1
data/lib/informers.rb +3 -0
metadata +8 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4ea317272c5054b01616643e7e0f0b2b2fe0c4a87fe8399350a6b8d0a279c5a1
-  data.tar.gz: 530f8aaab9a5ca71811a82adca0272e2ca84525bcf1f60f2209c394cbd0f9c2a
+  metadata.gz: ab4f19adb4d6ca0289784cee6c6cb5235b73a5184abffbeaf44391768be1f0ac
+  data.tar.gz: '0880ce4dced5ce47ceaaa5fee8d10e6324b3fc0a23e05c3da3728414dcc273d9'
 SHA512:
-  metadata.gz: 76059b486e6f6c0b0054450f76813dd4bf12845da6f46e8089585cd1a69be7db86a0acf446cc5a18e48108393403324626f6656d09bdb69083f2651abc0d2448
-  data.tar.gz: f466f5382edd76a7092dc6ada349a3e58fe7eedcd481726ca765f8ddfb4543b7269dab96c00a93d10b0fd67f800afd70a619cfb15d78dde494b29cc13d21ef1a
+  metadata.gz: eb3ee6d16e4e20eca6fae3fae8f97d78ba6bb655d48e2012640d64538785e2a9ff2afb10269cf01db928553438e8fbd08584774ba3f3d08bc25f36cbb971a99a
+  data.tar.gz: '0008441293f2605ec8599135d715093053e21f67f56ba59b730a3bc1f46f04f4a7fabb7fef039f156cd4183011c93b7fc9cab6ba731bf78627244bc4dedcf18d'

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,12 @@
+## 1.1.0 (2024-09-17)
+- Added more pipelines
+## 1.0.3 (2024-08-29)
+- Added `model_output` option
+- Improved `model_file_name` option
 ## 1.0.2 (2024-08-28)
 - Added `embedding` pipeline

data/README.md CHANGED Viewed

@@ -30,10 +30,15 @@ Embedding
 - [intfloat/e5-base-v2](#intfloate5-base-v2)
 - [nomic-ai/nomic-embed-text-v1](#nomic-ainomic-embed-text-v1)
 - [BAAI/bge-base-en-v1.5](#baaibge-base-en-v15)
+- [jinaai/jina-embeddings-v2-base-en](#jinaaijina-embeddings-v2-base-en)
+- [Snowflake/snowflake-arctic-embed-m-v1.5](#snowflakesnowflake-arctic-embed-m-v15)
+- [Xenova/all-mpnet-base-v2](#xenovaall-mpnet-base-v2)
-Reranking (experimental)
+Reranking
 - [mixedbread-ai/mxbai-rerank-base-v1](#mixedbread-aimxbai-rerank-base-v1)
+- [jinaai/jina-reranker-v1-turbo-en](#jinaaijina-reranker-v1-turbo-en)
+- [BAAI/bge-reranker-base](#baaibge-reranker-base)
 ### sentence-transformers/all-MiniLM-L6-v2
@@ -72,18 +77,16 @@ doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }
 [Docs](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)
 ```ruby
-def transform_query(query)
-  "Represent this sentence for searching relevant passages: #{query}"
-end
+query_prefix = "Represent this sentence for searching relevant passages: "
-docs = [
-  transform_query("puppy"),
+input = [
   "The dog is barking",
-  "The cat is purring"
+  "The cat is purring",
+  query_prefix + "puppy"
 ]
 model = Informers.pipeline("embedding", "mixedbread-ai/mxbai-embed-large-v1")
-embeddings = model.(docs)
+embeddings = model.(input)
 ```
 ### Supabase/gte-small
@@ -102,9 +105,12 @@ embeddings = model.(sentences)
 [Docs](https://huggingface.co/intfloat/e5-base-v2)
 ```ruby
+doc_prefix = "passage: "
+query_prefix = "query: "
 input = [
-  "passage: Ruby is a programming language created by Matz",
-  "query: Ruby creator"
+  doc_prefix + "Ruby is a programming language created by Matz",
+  query_prefix + "Ruby creator"
 ]
 model = Informers.pipeline("embedding", "intfloat/e5-base-v2")
@@ -116,9 +122,13 @@ embeddings = model.(input)
 [Docs](https://huggingface.co/nomic-ai/nomic-embed-text-v1)
 ```ruby
+doc_prefix = "search_document: "
+query_prefix = "search_query: "
 input = [
-  "search_document: The dog is barking",
-  "search_query: puppy"
+  doc_prefix + "The dog is barking",
+  doc_prefix + "The cat is purring",
+  query_prefix + "puppy"
 ]
 model = Informers.pipeline("embedding", "nomic-ai/nomic-embed-text-v1")
@@ -130,20 +140,57 @@ embeddings = model.(input)
 [Docs](https://huggingface.co/BAAI/bge-base-en-v1.5)
 ```ruby
-def transform_query(query)
-  "Represent this sentence for searching relevant passages: #{query}"
-end
+query_prefix = "Represent this sentence for searching relevant passages: "
 input = [
-  transform_query("puppy"),
   "The dog is barking",
-  "The cat is purring"
+  "The cat is purring",
+  query_prefix + "puppy"
 ]
 model = Informers.pipeline("embedding", "BAAI/bge-base-en-v1.5")
 embeddings = model.(input)
 ```
+### jinaai/jina-embeddings-v2-base-en
+[Docs](https://huggingface.co/jinaai/jina-embeddings-v2-base-en)
+```ruby
+sentences = ["How is the weather today?", "What is the current weather like today?"]
+model = Informers.pipeline("embedding", "jinaai/jina-embeddings-v2-base-en", model_file_name: "../model")
+embeddings = model.(sentences)
+```
+### Snowflake/snowflake-arctic-embed-m-v1.5
+[Docs](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5)
+```ruby
+query_prefix = "Represent this sentence for searching relevant passages: "
+input = [
+  "The dog is barking",
+  "The cat is purring",
+  query_prefix + "puppy"
+]
+model = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
+embeddings = model.(input, model_output: "sentence_embedding", pooling: "none")
+```
+### Xenova/all-mpnet-base-v2
+[Docs](https://huggingface.co/Xenova/all-mpnet-base-v2)
+```ruby
+sentences = ["This is an example sentence", "Each sentence is converted"]
+model = Informers.pipeline("embedding", "Xenova/all-mpnet-base-v2")
+embeddings = model.(sentences)
+```
 ### mixedbread-ai/mxbai-rerank-base-v1
 [Docs](https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v1)
@@ -156,6 +203,30 @@ model = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-base-v1")
 result = model.(query, docs)
 ```
+### jinaai/jina-reranker-v1-turbo-en
+[Docs](https://huggingface.co/jinaai/jina-reranker-v1-turbo-en)
+```ruby
+query = "How many people live in London?"
+docs = ["Around 9 Million people live in London", "London is known for its financial district"]
+model = Informers.pipeline("reranking", "jinaai/jina-reranker-v1-turbo-en")
+result = model.(query, docs)
+```
+### BAAI/bge-reranker-base
+[Docs](https://huggingface.co/BAAI/bge-reranker-base)
+```ruby
+query = "How many people live in London?"
+docs = ["Around 9 Million people live in London", "London is known for its financial district"]
+model = Informers.pipeline("reranking", "BAAI/bge-reranker-base")
+result = model.(query, docs)
+```
 ### Other
 You can use the feature extraction pipeline directly.
@@ -165,10 +236,16 @@ model = Informers.pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2", quan
 embeddings = model.(sentences, pooling: "mean", normalize: true)
 ```
-The model files must include `onnx/model.onnx` or `onnx/model_quantized.onnx` ([example](https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main/onnx)).
+The model must include a `.onnx` file ([example](https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main/onnx)). If the file is not at `onnx/model.onnx` or `onnx/model_quantized.onnx`, use the `model_file_name` option to specify the location.
 ## Pipelines
+- [Text](#text)
+- [Vision](#vision)
+- [Multimodel](#multimodal)
+### Text
 Embedding
 ```ruby
@@ -176,7 +253,7 @@ embed = Informers.pipeline("embedding")
 embed.("We are very happy to show you the 🤗 Transformers library.")
 ```
-Reranking (experimental)
+Reranking
 ```ruby
 rerank = Informers.pipeline("reranking")
@@ -204,6 +281,48 @@ qa = Informers.pipeline("question-answering")
 qa.("Who invented Ruby?", "Ruby is a programming language created by Matz")
 ```
+Zero-shot classification
+```ruby
+classifier = Informers.pipeline("zero-shot-classification")
+classifier.("text", ["label1", "label2", "label3"])
+```
+Text generation
+```ruby
+generator = Informers.pipeline("text-generation")
+generator.("I enjoy walking with my cute dog,")
+```
+Text-to-text generation
+```ruby
+text2text = Informers.pipeline("text2text-generation")
+text2text.("translate from English to French: I'm very happy")
+```
+Translation
+```ruby
+translator = Informers.pipeline("translation", "Xenova/nllb-200-distilled-600M")
+translator.("जीवन एक चॉकलेट बॉक्स की तरह है।", src_lang: "hin_Deva", tgt_lang: "fra_Latn")
+```
+Summarization
+```ruby
+summarizer = Informers.pipeline("summarization")
+summarizer.("Many paragraphs of text")
+```
+Fill mask
+```ruby
+unmasker = Informers.pipeline("fill-mask")
+unmasker.("Paris is the [MASK] of France.")
+```
 Feature extraction
 ```ruby
@@ -211,6 +330,80 @@ extractor = Informers.pipeline("feature-extraction")
 extractor.("We are very happy to show you the 🤗 Transformers library.")
 ```
+### Vision
+Image classification
+```ruby
+classifier = Informers.pipeline("image-classification")
+classifier.("image.jpg")
+```
+Zero-shot image classification
+```ruby
+classifier = Informers.pipeline("zero-shot-image-classification")
+classifier.("image.jpg", ["label1", "label2", "label3"])
+```
+Image segmentation
+```ruby
+segmenter = Informers.pipeline("image-segmentation")
+segmenter.("image.jpg")
+```
+Object detection
+```ruby
+detector = Informers.pipeline("object-detection")
+detector.("image.jpg")
+```
+Zero-shot object detection
+```ruby
+detector = Informers.pipeline("zero-shot-object-detection")
+detector.("image.jpg", ["label1", "label2", "label3"])
+```
+Depth estimation
+```ruby
+estimator = Informers.pipeline("depth-estimation")
+estimator.("image.jpg")
+```
+Image-to-image
+```ruby
+upscaler = Informers.pipeline("image-to-image")
+upscaler.("image.jpg")
+```
+Image feature extraction
+```ruby
+extractor = Informers.pipeline("image-feature-extraction")
+extractor.("image.jpg")
+```
+### Multimodal
+Image captioning
+```ruby
+captioner = Informers.pipeline("image-to-text")
+captioner.("image.jpg")
+```
+Document question answering
+```ruby
+qa = Informers.pipeline("document-question-answering")
+qa.("image.jpg", "What is the invoice number?")
+```
 ## Credits
 This library was ported from [Transformers.js](https://github.com/xenova/transformers.js) and is available under the same license.
@@ -250,5 +443,6 @@ To get started with development:
 git clone https://github.com/ankane/informers.git
 cd informers
 bundle install
+bundle exec rake download:files
 bundle exec rake test
 ```

data/lib/informers/configs.rb CHANGED Viewed

@@ -1,17 +1,19 @@
 module Informers
   class PretrainedConfig
-    attr_reader :model_type, :problem_type, :id2label
     def initialize(config_json)
-      @is_encoder_decoder = false
-      @model_type = config_json["model_type"]
-      @problem_type = config_json["problem_type"]
-      @id2label = config_json["id2label"]
+      @config_json = config_json.to_h
     end
     def [](key)
-      instance_variable_get("@#{key}")
+      @config_json[key.to_s]
+    end
+    def []=(key, value)
+      @config_json[key.to_s] = value
+    end
+    def to_h
+      @config_json.to_h
     end
     def self.from_pretrained(

data/lib/informers/model.rb CHANGED Viewed

@@ -1,24 +1,12 @@
 module Informers
   class Model
     def initialize(model_id, quantized: false)
-      @model_id = model_id
       @model = Informers.pipeline("embedding", model_id, quantized: quantized)
+      @options = model_id == "mixedbread-ai/mxbai-embed-large-v1" ? {pooling: "cls", normalize: false} : {}
     end
     def embed(texts)
-      is_batched = texts.is_a?(Array)
-      texts = [texts] unless is_batched
-      case @model_id
-      when "sentence-transformers/all-MiniLM-L6-v2", "Xenova/all-MiniLM-L6-v2", "Xenova/multi-qa-MiniLM-L6-cos-v1", "Supabase/gte-small"
-        output = @model.(texts)
-      when "mixedbread-ai/mxbai-embed-large-v1"
-        output = @model.(texts, pooling: "cls", normalize: false)
-      else
-        raise Error, "Use the embedding pipeline for this model: #{@model_id}"
-      end
-      is_batched ? output : output[0]
+      @model.(texts, **@options)
     end
   end
 end