gliner 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: edbbf72af6499d1823db172793f3dc183f90e445d4ae45587dc1a0cd4005c15e
4
- data.tar.gz: 04d0881c074e9e84617591768c1b53767d2f509bbb7497c6bc0920bf3bac96b3
3
+ metadata.gz: a8e4bdddc47289f29cce2469ff6bbeecc9ef260b39f3b02f6e3e89b23e3e4e7a
4
+ data.tar.gz: b59ad90385c8da5b478f007f5715994c11f1f1a0b85617acc501acb8b17723aa
5
5
  SHA512:
6
- metadata.gz: 0ea1480d83e383534c50d14320f5cbfcc7c7a890101f9e0492cbd486dc36e05fbebaedaeea31ccf4f8b5efb3f553349cee1b25ad890457734443054d7e1dfb91
7
- data.tar.gz: da1f798fd89bcaad28a41e67ec2f07e779e8a7d2681ac3d4790f2e27b2adfcfcc02f5c42e7b02c8eb7bc519fea4178f16bad6050576543dde241e8becd18dbec
6
+ metadata.gz: 072cc980f4653d74da83d3cfea1b09d0cbf8e023bfb6d3829ce6879163b7dc77a170ef63ad68138eca7d892521f30eda8d0be6c72e62cbe0353732aa4542bab3
7
+ data.tar.gz: fb84732285ff71edad266533cfd513d51036c709a4c5275fdc6548a373f11ead86eb66978fd0b28522c567af52e7d2ce8111053d721ec6ddcee6546db8bf58ed
data/README.md CHANGED
@@ -1,9 +1,9 @@
1
- # Gliner
1
+ # GLiNER
2
+ [![tests](https://github.com/elcuervo/gliner/actions/workflows/tests.yml/badge.svg)](https://github.com/elcuervo/gliner/actions/workflows/tests.yml)
3
+ ![Gem Version](https://img.shields.io/gem/v/gliner)
2
4
 
3
5
  ![](https://images.unsplash.com/photo-1625768376503-68d2495d78c5?q=80&w=2225&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D)
4
6
 
5
- Minimal Ruby inference wrapper for the **GLiNER2** ONNX model using:
6
-
7
7
  ## Install
8
8
 
9
9
  ```ruby
@@ -11,63 +11,65 @@ gem "gliner"
11
11
  ```
12
12
 
13
13
  ## Usage
14
- ### entities
14
+
15
+ ### Entities
15
16
 
16
17
  ```ruby
17
18
  require "gliner"
18
19
 
19
- Gliner.load("path/to/gliner2-multi-v1")
20
+ Gliner.configure do |config|
21
+ config.threshold = 0.2
22
+ # If unset, auto! downloads the default model to .cache/
23
+ # Or set a local path explicitly:
24
+ # config.model = "/path/to/gliner2-multi-v1"
25
+ config.variant = :fp16
26
+ config.auto!
27
+ end
20
28
 
21
29
  text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday."
22
30
  labels = ["company", "person", "product", "location"]
23
31
 
24
32
  model = Gliner[labels]
25
33
  pp model[text]
26
- ```
27
34
 
28
- Expected shape:
29
-
30
- ```ruby
31
- {"entities"=>{"company"=>["Apple"], "person"=>["Tim Cook"], "product"=>["iPhone 15"], "location"=>["Cupertino"]}}
35
+ # => {"company"=>["Apple"], "person"=>["Tim Cook"], "product"=>["iPhone 15"], "location"=>["Cupertino"]}
32
36
  ```
33
37
 
34
38
  You can also pass per-entity configs:
35
39
 
36
40
  ```ruby
37
41
  labels = {
38
- "email" => { "description" => "Email addresses", "dtype" => "list", "threshold" => 0.9 },
39
- "person" => { "description" => "Person names", "dtype" => "str" }
42
+ email: { description: "Email addresses", dtype: "list", threshold: 0.9 },
43
+ person: { description: "Person names", dtype: "str" }
40
44
  }
41
45
 
42
46
  model = Gliner[labels]
43
47
  pp model["Email John Doe at john@example.com.", threshold: 0.5]
48
+
49
+ # => {"email"=>["john@example.com"], "person"=>"John Doe"}
44
50
  ```
45
51
 
46
- ### classification
52
+ ### Classification
47
53
 
48
54
  ```ruby
49
55
  model = Gliner.classify[
50
- { "sentiment" => %w[positive negative neutral] }
56
+ { sentiment: %w[positive negative neutral] }
51
57
  ]
52
58
 
53
59
  result = model["This laptop has amazing performance but terrible battery life!"]
54
60
 
55
61
  pp result
56
- ```
57
62
 
58
- Expected shape:
59
-
60
- ```ruby
61
- {"sentiment"=>"negative"}
63
+ # => {"sentiment"=>"negative"}
62
64
  ```
63
65
 
64
- ### structured extraction
66
+ ### Structured extraction
65
67
 
66
68
  ```ruby
67
69
  text = "iPhone 15 Pro Max with 256GB storage, A17 Pro chip, priced at $1199."
68
70
 
69
71
  structure = {
70
- "product" => [
72
+ product: [
71
73
  "name::str::Full product name and model",
72
74
  "storage::str::Storage capacity",
73
75
  "processor::str::Chip or processor information",
@@ -78,18 +80,16 @@ structure = {
78
80
  result = Gliner[structure][text]
79
81
 
80
82
  pp result
81
- ```
82
-
83
- Expected shape:
84
83
 
85
- ```ruby
86
- {"product"=>[{"name"=>"iPhone 15 Pro Max", "storage"=>"256GB", "processor"=>"A17 Pro chip", "price"=>"$1199"}]}
84
+ # => {"product"=>[{"name"=>"iPhone 15 Pro Max", "storage"=>"256GB", "processor"=>"A17 Pro", "price"=>"1199"}]}
87
85
  ```
88
86
 
89
87
  Choices can be included in field specs:
90
88
 
91
89
  ```ruby
92
- result = Gliner[{ "order" => ["status::[pending|processing|shipped]::str"] }]["Status: shipped"]
90
+ result = Gliner[{ order: ["status::[pending|processing|shipped]::str"] }]["Status: shipped"]
91
+
92
+ # => {"order"=>[{"status"=>"shipped"}]}
93
93
  ```
94
94
 
95
95
  ## Model files
@@ -97,10 +97,21 @@ result = Gliner[{ "order" => ["status::[pending|processing|shipped]::str"] }]["S
97
97
  This implementation expects a directory containing:
98
98
 
99
99
  - `tokenizer.json`
100
- - `model.onnx` or `model_int8.onnx`
100
+ - `model.onnx`, `model_fp16.onnx`, or `model_int8.onnx`
101
101
  - (optional) `config.json` with `max_width` and `max_seq_len`
102
102
 
103
103
  One publicly available ONNX export is `cuerbot/gliner2-multi-v1` on Hugging Face.
104
+ By default, `model_fp16.onnx` is used; set `config.variant` (or `GLINER_MODEL_FILE`) to override.
105
+ Variants map to files as: `:fp16` → `model_fp16.onnx`, `:fp32` → `model.onnx`, `:int8` → `model_int8.onnx`.
106
+
107
+ You can also configure the model source directly:
108
+
109
+ ```ruby
110
+ Gliner.configure do |config|
111
+ config.model = "/path/to/model_dir"
112
+ config.variant = :int8
113
+ end
114
+ ```
104
115
 
105
116
  ## Integration test
106
117
 
@@ -135,7 +146,7 @@ If you omit `MODEL_DIR`, the console auto-downloads a public test model (configu
135
146
  ```bash
136
147
  rake console
137
148
  # or:
138
- GLINER_REPO_ID=cuerbot/gliner2-multi-v1 GLINER_MODEL_FILE=model_int8.onnx rake console
149
+ GLINER_REPO_ID=cuerbot/gliner2-multi-v1 GLINER_MODEL_FILE=model_fp16.onnx rake console
139
150
  ```
140
151
 
141
152
  Or:
data/bin/console CHANGED
@@ -12,13 +12,15 @@ require "httpx"
12
12
  require "irb"
13
13
 
14
14
  DEFAULT_REPO_ID = "cuerbot/gliner2-multi-v1"
15
- DEFAULT_MODEL_FILE = "model_int8.onnx"
15
+ DEFAULT_MODEL_FILE = "model_fp16.onnx"
16
+ DEFAULT_MODEL_SUBDIR = "onnx"
16
17
 
17
- def ensure_model_dir!(repo_id:, model_file:)
18
+ def ensure_model_dir!(repo_id:, model_file:, model_subdir:)
18
19
  dir = File.expand_path("../tmp/models/#{repo_id.tr('/', '__')}", __dir__)
19
20
  FileUtils.mkdir_p(dir)
20
21
 
21
22
  base = "https://huggingface.co/#{repo_id}/resolve/main"
23
+ base = "#{base}/#{model_subdir}" unless model_subdir.nil? || model_subdir.empty?
22
24
  files = ["tokenizer.json", "config.json", model_file]
23
25
 
24
26
  files.each do |file|
@@ -40,13 +42,14 @@ end
40
42
  model_dir = ARGV[0] || ENV["GLINER_MODEL_DIR"]
41
43
  repo_id = ENV["GLINER_REPO_ID"] || DEFAULT_REPO_ID
42
44
  model_file = ENV["GLINER_MODEL_FILE"] || DEFAULT_MODEL_FILE
45
+ model_subdir = ENV["GLINER_MODEL_SUBDIR"] || DEFAULT_MODEL_SUBDIR
43
46
 
44
47
  if model_dir && !model_dir.empty?
45
48
  $gliner_model = Gliner.load(model_dir, file: model_file)
46
49
  else
47
50
  begin
48
51
  require "fileutils"
49
- model_dir = ensure_model_dir!(repo_id: repo_id, model_file: model_file)
52
+ model_dir = ensure_model_dir!(repo_id: repo_id, model_file: model_file, model_subdir: model_subdir)
50
53
  $gliner_model = Gliner.load(model_dir, file: model_file)
51
54
  rescue => e
52
55
  warn "No model loaded (auto-download failed: #{e.class}: #{e.message})"
data/gliner.gemspec CHANGED
@@ -11,16 +11,18 @@ Gem::Specification.new do |spec|
11
11
  spec.description = 'Basic Ruby inference wrapper for the GLiNER2 ONNX model.'
12
12
  spec.homepage = 'https://github.com/elcuervo/gliner'
13
13
  spec.license = 'MIT'
14
- spec.required_ruby_version = '>= 3.2'
14
+ spec.required_ruby_version = '>= 3.3'
15
15
 
16
16
  spec.files = Dir.glob('lib/**/*') + Dir.glob('bin/*') + %w[README.md LICENSE gliner.gemspec]
17
17
  spec.require_paths = ['lib']
18
18
 
19
+ spec.add_dependency 'httpx', '~> 1.0'
19
20
  spec.add_dependency 'onnxruntime', '~> 0.10'
20
21
  spec.add_dependency 'tokenizers', '~> 0.6'
21
22
 
22
- spec.add_development_dependency 'httpx', '~> 1.0'
23
23
  spec.add_development_dependency 'rake', '~> 13.0'
24
24
  spec.add_development_dependency 'rspec', '~> 3.13'
25
25
  spec.add_development_dependency 'rubocop', '~> 1.50'
26
+
27
+ spec.metadata['rubygems_mfa_required'] = 'true'
26
28
  end
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Gliner
4
+ class Configuration
5
+ DEFAULT_THRESHOLD = 0.5
6
+
7
+ attr_accessor :threshold, :model
8
+ attr_reader :variant
9
+
10
+ def initialize
11
+ @threshold = DEFAULT_THRESHOLD
12
+ @model = nil
13
+ @variant = :fp16
14
+ @auto = false
15
+ end
16
+
17
+ def variant=(value)
18
+ @variant = value&.to_sym
19
+ end
20
+
21
+ def auto!(value = true)
22
+ @auto = !!value
23
+ end
24
+
25
+ def auto=(value)
26
+ @auto = !!value
27
+ end
28
+
29
+ def auto?
30
+ @auto
31
+ end
32
+ end
33
+ end
data/lib/gliner/model.rb CHANGED
@@ -22,7 +22,7 @@ module Gliner
22
22
  DEFAULT_MAX_WIDTH = 8
23
23
  DEFAULT_MAX_SEQ_LEN = 512
24
24
 
25
- def self.from_dir(dir, file: 'model_int8.onnx')
25
+ def self.from_dir(dir, file: 'model_fp16.onnx')
26
26
  config_path = File.join(dir, 'config.json')
27
27
  config = File.exist?(config_path) ? JSON.parse(File.read(config_path)) : {}
28
28
 
@@ -100,7 +100,7 @@ module Gliner
100
100
  end
101
101
 
102
102
  def extract_entities(text, entity_types, **options)
103
- threshold = options.fetch(:threshold, 0.5)
103
+ threshold = options.fetch(:threshold, Gliner.config.threshold)
104
104
  include_confidence = options.fetch(:include_confidence, false)
105
105
  include_spans = options.fetch(:include_spans, false)
106
106
 
@@ -125,7 +125,7 @@ module Gliner
125
125
  end
126
126
 
127
127
  def extract_json(text, structures, **options)
128
- threshold = options.fetch(:threshold, 0.5)
128
+ threshold = options.fetch(:threshold, Gliner.config.threshold)
129
129
  include_confidence = options.fetch(:include_confidence, false)
130
130
  include_spans = options.fetch(:include_spans, false)
131
131
 
@@ -29,7 +29,7 @@ module Gliner
29
29
  end
30
30
 
31
31
  def process_output(logits, parsed, prepared, options)
32
- threshold = options.fetch(:threshold, 0.5)
32
+ threshold = options.fetch(:threshold, Gliner.config.threshold)
33
33
  format_opts = FormatOptions.from(options)
34
34
  label_positions = options[:label_positions] || inference.label_positions_for(prepared.word_ids, parsed[:labels].length)
35
35
 
@@ -83,7 +83,7 @@ module Gliner
83
83
  parsed[:labels],
84
84
  label_positions,
85
85
  prepared,
86
- threshold: options.fetch(:threshold, 0.5)
86
+ threshold: options.fetch(:threshold, Gliner.config.threshold)
87
87
  )
88
88
  end
89
89
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gliner
4
- VERSION = '0.1.0'
4
+ VERSION = '0.2.0'
5
5
  end
data/lib/gliner.rb CHANGED
@@ -1,6 +1,9 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'fileutils'
4
+ require 'httpx'
3
5
  require 'gliner/version'
6
+ require 'gliner/configuration'
4
7
  require 'gliner/model'
5
8
  require 'gliner/runners/prepared_task'
6
9
  require 'gliner/runners/entity_runner'
@@ -8,6 +11,11 @@ require 'gliner/runners/structured_runner'
8
11
  require 'gliner/runners/classification_runner'
9
12
 
10
13
  module Gliner
14
+ HF_REPO = 'cuerbot/gliner2-multi-v1'
15
+ HF_DIR = 'onnx'
16
+
17
+ DEFAULT_MODEL_BASE = "https://huggingface.co/#{HF_REPO}/resolve/main/#{HF_DIR}".freeze
18
+
11
19
  Error = Class.new(StandardError)
12
20
 
13
21
  PreparedInput = Data.define(
@@ -40,43 +48,62 @@ module Gliner
40
48
  end
41
49
 
42
50
  class << self
43
- attr_writer :model
51
+ attr_writer :model, :config
44
52
 
45
- def load(dir, file: 'model_int8.onnx')
46
- self.model = Model.from_dir(dir, file: file)
53
+ def configure
54
+ yield(config)
55
+
56
+ reset_model!
57
+ apply_model_source!
47
58
  end
48
59
 
49
- def model
50
- @model ||= model_from_env
60
+ def config
61
+ @config ||= Configuration.new
51
62
  end
52
63
 
53
- def model!
54
- fetch_model!
64
+ def load(dir, file: nil)
65
+ file ||= ENV['GLINER_MODEL_FILE'] || model_file_for_variant(config.variant)
66
+
67
+ self.model = Model.from_dir(dir, file: file)
68
+ end
69
+
70
+ def model
71
+ @model ||= model_from_config || model_from_env
55
72
  end
56
73
 
57
74
  def [](config)
58
- runner_for(config).new(fetch_model!, config)
75
+ runner_for(config).new(model!, config)
59
76
  end
60
77
 
61
78
  def classify
62
79
  Runners::ClassificationRunner
63
80
  end
64
81
 
82
+ def model!
83
+ model = self.model
84
+
85
+ return model if model
86
+
87
+ raise Error, 'No model loaded. Call Gliner.load("/path/to/model"), set config.model, or set GLINER_MODEL_DIR.'
88
+ end
89
+
65
90
  private
66
91
 
67
- def model_from_env
68
- dir = ENV.fetch('GLINER_MODEL_DIR', nil)
69
- return nil if dir.nil? || dir.empty?
92
+ def model_from_config
93
+ source = config.model
94
+ return nil if source.nil?
70
95
 
71
- file = ENV['GLINER_MODEL_FILE'] || 'model_int8.onnx'
72
- Model.from_dir(dir, file: file)
96
+ file = model_file_for_variant(config.variant)
97
+ Model.from_dir(source, file: file)
73
98
  end
74
99
 
75
- def fetch_model!
76
- model = self.model
77
- return model if model
100
+ def model_from_env
101
+ dir = ENV.fetch('GLINER_MODEL_DIR', nil)
102
+ return if dir.nil?
78
103
 
79
- raise Error, 'No model loaded. Call Gliner.load("/path/to/model") or set GLINER_MODEL_DIR.'
104
+ file = ENV['GLINER_MODEL_FILE'] || model_file_for_variant(config.variant)
105
+
106
+ Model.from_dir(dir, file: file)
80
107
  end
81
108
 
82
109
  def runner_for(config)
@@ -89,9 +116,53 @@ module Gliner
89
116
  return false unless config.is_a?(Hash)
90
117
 
91
118
  keys = config.transform_keys(&:to_s)
119
+
92
120
  return true if keys.key?('name') && keys.key?('fields')
93
121
 
94
122
  config.values.all? { |value| value.is_a?(Array) }
95
123
  end
124
+
125
+ def reset_model!
126
+ @model = nil
127
+ end
128
+
129
+ def apply_model_source!
130
+ return unless config.auto?
131
+
132
+ source = config.model
133
+ return unless source.nil? || source.empty?
134
+
135
+ config.model = download_default_model
136
+ end
137
+
138
+ def download_default_model
139
+ model_file = model_file_for_variant(config.variant)
140
+ root = File.expand_path('..', __dir__)
141
+ dir = File.join(root, '.cache', 'models', HF_REPO.tr('/', '__'))
142
+
143
+ FileUtils.mkdir_p(dir)
144
+
145
+ files = ['tokenizer.json', 'config.json', model_file]
146
+ client = HTTPX.plugin(:follow_redirects)
147
+
148
+ files.each do |file|
149
+ response = client.get("#{DEFAULT_MODEL_BASE}/#{file}")
150
+ raise Error, "Download failed: #{file}" if response.error?
151
+
152
+ File.binwrite(File.join(dir, file), response.body.to_s)
153
+ end
154
+
155
+ dir
156
+ end
157
+
158
+ def model_file_for_variant(variant = :fp16)
159
+ case variant.to_sym
160
+ when :fp16 then 'model_fp16.onnx'
161
+ when :fp32 then 'model.onnx'
162
+ when :int8 then 'model_int8.onnx'
163
+ else
164
+ raise Error, "Unknown model variant: #{variant.inspect}"
165
+ end
166
+ end
96
167
  end
97
168
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gliner
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - elcuervo
@@ -10,47 +10,47 @@ cert_chain: []
10
10
  date: 1980-01-01 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
- name: onnxruntime
13
+ name: httpx
14
14
  requirement: !ruby/object:Gem::Requirement
15
15
  requirements:
16
16
  - - "~>"
17
17
  - !ruby/object:Gem::Version
18
- version: '0.10'
18
+ version: '1.0'
19
19
  type: :runtime
20
20
  prerelease: false
21
21
  version_requirements: !ruby/object:Gem::Requirement
22
22
  requirements:
23
23
  - - "~>"
24
24
  - !ruby/object:Gem::Version
25
- version: '0.10'
25
+ version: '1.0'
26
26
  - !ruby/object:Gem::Dependency
27
- name: tokenizers
27
+ name: onnxruntime
28
28
  requirement: !ruby/object:Gem::Requirement
29
29
  requirements:
30
30
  - - "~>"
31
31
  - !ruby/object:Gem::Version
32
- version: '0.6'
32
+ version: '0.10'
33
33
  type: :runtime
34
34
  prerelease: false
35
35
  version_requirements: !ruby/object:Gem::Requirement
36
36
  requirements:
37
37
  - - "~>"
38
38
  - !ruby/object:Gem::Version
39
- version: '0.6'
39
+ version: '0.10'
40
40
  - !ruby/object:Gem::Dependency
41
- name: httpx
41
+ name: tokenizers
42
42
  requirement: !ruby/object:Gem::Requirement
43
43
  requirements:
44
44
  - - "~>"
45
45
  - !ruby/object:Gem::Version
46
- version: '1.0'
47
- type: :development
46
+ version: '0.6'
47
+ type: :runtime
48
48
  prerelease: false
49
49
  version_requirements: !ruby/object:Gem::Requirement
50
50
  requirements:
51
51
  - - "~>"
52
52
  - !ruby/object:Gem::Version
53
- version: '1.0'
53
+ version: '0.6'
54
54
  - !ruby/object:Gem::Dependency
55
55
  name: rake
56
56
  requirement: !ruby/object:Gem::Requirement
@@ -108,6 +108,7 @@ files:
108
108
  - lib/gliner/config/entity_types.rb
109
109
  - lib/gliner/config/field_spec.rb
110
110
  - lib/gliner/config_parser.rb
111
+ - lib/gliner/configuration.rb
111
112
  - lib/gliner/inference.rb
112
113
  - lib/gliner/inference/session_validator.rb
113
114
  - lib/gliner/input_builder.rb
@@ -129,7 +130,8 @@ files:
129
130
  homepage: https://github.com/elcuervo/gliner
130
131
  licenses:
131
132
  - MIT
132
- metadata: {}
133
+ metadata:
134
+ rubygems_mfa_required: 'true'
133
135
  rdoc_options: []
134
136
  require_paths:
135
137
  - lib
@@ -137,14 +139,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
137
139
  requirements:
138
140
  - - ">="
139
141
  - !ruby/object:Gem::Version
140
- version: '3.2'
142
+ version: '3.3'
141
143
  required_rubygems_version: !ruby/object:Gem::Requirement
142
144
  requirements:
143
145
  - - ">="
144
146
  - !ruby/object:Gem::Version
145
147
  version: '0'
146
148
  requirements: []
147
- rubygems_version: 3.7.2
149
+ rubygems_version: 3.6.9
148
150
  specification_version: 4
149
151
  summary: Schema-based information extraction (GLiNER2) via ONNX Runtime
150
152
  test_files: []