informers 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7fab4014ceee446289bf0fb5a3c5b32630462eddba5c1b10e2104c3c42ee43ea
4
- data.tar.gz: a09cdc2dc9676a91a5e6c0ab0c3712653b99d2dbae1fe91b1ac55b32777c8cb9
3
+ metadata.gz: 30960cffae248b704482b2faaa2d573cb9d9cf491543d8c9593b8c937d997a9f
4
+ data.tar.gz: c6ce6d049ae38eb6a154fb7cd2fbef2cd709a71663f2ffc6951869687ae779e7
5
5
  SHA512:
6
- metadata.gz: eb4693b6ff9cd60ccdaf727de793a90b0b8576bdc57376a22e78759032144c4bc470b9750bdc38e2288229370dd43f5ca982b407d553ea7ca73b5fff0d5a9e3a
7
- data.tar.gz: 7ad9384587b2c12ff09d21c4f1074b297758c7ef65142bfc19bb730e3034d7b9febd180253068b6181814803d6d599e4685a4ef736c734b1b45c41ee700fd3ec
6
+ metadata.gz: eb3382ec97e9ffbf7dbada8440290c2c7a2155574d2b5c3a14357dffcf19ac8b36256a0ae2265923cdfb6efed30cc1d1f3106576edfeaa652853c42d18f80063
7
+ data.tar.gz: 13bc7da32218b600d49d0289dfcb258b25a2e3cce4793398eb92e8e48c2886cfcf7f944e7f666fa5d4170c0db8eade00599d4d4333974eea5e285b0ecf946b5d
@@ -1,3 +1,7 @@
1
+ ## 0.1.2 (2020-11-24)
2
+
3
+ - Added feature extraction
4
+
1
5
  ## 0.1.1 (2020-10-05)
2
6
 
3
7
  - Fixed question answering for Ruby < 2.7
data/README.md CHANGED
@@ -11,7 +11,7 @@ Supports:
11
11
  - Summarization - *in development*
12
12
  - Translation - *in development*
13
13
 
14
- [![Build Status](https://travis-ci.org/ankane/informers.svg?branch=master)](https://travis-ci.org/ankane/informers)
14
+ [![Build Status](https://github.com/ankane/informers/workflows/build/badge.svg?branch=master)](https://github.com/ankane/informers/actions)
15
15
 
16
16
  ## Installation
17
17
 
@@ -106,11 +106,19 @@ This returns
106
106
  Task | Description | Contributor | License | Link
107
107
  --- | --- | --- | --- | ---
108
108
  Sentiment analysis | DistilBERT fine-tuned on SST-2 | Hugging Face | Apache-2.0 | [Link](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
109
- Question answering | DistilBERT | Hugging Face | Apache-2.0 | [Link](https://huggingface.co/distilbert-base-cased-distilled-squad)
109
+ Question answering | DistilBERT fine-tuned on SQuAD | Hugging Face | Apache-2.0 | [Link](https://huggingface.co/distilbert-base-cased-distilled-squad)
110
110
  Named-entity recognition | BERT fine-tuned on CoNLL03 | Bayerische Staatsbibliothek | In-progress | [Link](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)
111
111
 
112
112
  Models are [quantized](https://medium.com/microsoftazure/faster-and-smaller-quantized-nlp-with-hugging-face-and-onnx-runtime-ec5525473bb7) to make them faster and smaller.
113
113
 
114
+ ## Deployment
115
+
116
+ Check out [Trove](https://github.com/ankane/trove) for deploying models.
117
+
118
+ ```sh
119
+ trove push sentiment-analysis.onnx
120
+ ```
121
+
114
122
  ## Credits
115
123
 
116
124
  This project uses many state-of-the-art technologies:
@@ -3,6 +3,7 @@ require "blingfire"
3
3
  require "onnxruntime"
4
4
 
5
5
  # modules
6
+ require "informers/feature_extraction"
6
7
  require "informers/ner"
7
8
  require "informers/question_answering"
8
9
  require "informers/sentiment_analysis"
@@ -0,0 +1,59 @@
1
+ # Copyright 2018 The HuggingFace Inc. team.
2
+ # Copyright 2020 Andrew Kane.
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+
16
+ module Informers
17
+ class FeatureExtraction
18
+ def initialize(model_path)
19
+ tokenizer_path = File.expand_path("../../vendor/bert_base_cased_tok.bin", __dir__)
20
+ @tokenizer = BlingFire.load_model(tokenizer_path)
21
+ @model = OnnxRuntime::Model.new(model_path)
22
+ end
23
+
24
+ def predict(texts)
25
+ singular = !texts.is_a?(Array)
26
+ texts = [texts] if singular
27
+
28
+ # tokenize
29
+ input_ids =
30
+ texts.map do |text|
31
+ tokens = @tokenizer.text_to_ids(text, nil, 100) # unk token
32
+ tokens.unshift(101) # cls token
33
+ tokens << 102 # sep token
34
+ tokens
35
+ end
36
+
37
+ max_tokens = input_ids.map(&:size).max
38
+ attention_mask = []
39
+ input_ids.each do |ids|
40
+ zeros = [0] * (max_tokens - ids.size)
41
+
42
+ mask = ([1] * ids.size) + zeros
43
+ attention_mask << mask
44
+
45
+ ids.concat(zeros)
46
+ end
47
+
48
+ # infer
49
+ input = {
50
+ input_ids: input_ids,
51
+ attention_mask: attention_mask
52
+ }
53
+ output = @model.predict(input)
54
+ scores = output["output_0"]
55
+
56
+ singular ? scores.first : scores
57
+ end
58
+ end
59
+ end
@@ -1,3 +1,3 @@
1
1
  module Informers
2
- VERSION = "0.1.1"
2
+ VERSION = "0.1.2"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: informers
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-10-05 00:00:00.000000000 Z
11
+ date: 2020-11-24 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: blingfire
@@ -94,7 +94,7 @@ dependencies:
94
94
  - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
- description:
97
+ description:
98
98
  email: andrew@chartkick.com
99
99
  executables: []
100
100
  extensions: []
@@ -104,6 +104,7 @@ files:
104
104
  - LICENSE.txt
105
105
  - README.md
106
106
  - lib/informers.rb
107
+ - lib/informers/feature_extraction.rb
107
108
  - lib/informers/ner.rb
108
109
  - lib/informers/question_answering.rb
109
110
  - lib/informers/sentiment_analysis.rb
@@ -116,7 +117,7 @@ homepage: https://github.com/ankane/informers
116
117
  licenses:
117
118
  - Apache-2.0
118
119
  metadata: {}
119
- post_install_message:
120
+ post_install_message:
120
121
  rdoc_options: []
121
122
  require_paths:
122
123
  - lib
@@ -131,8 +132,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
131
132
  - !ruby/object:Gem::Version
132
133
  version: '0'
133
134
  requirements: []
134
- rubygems_version: 3.1.2
135
- signing_key:
135
+ rubygems_version: 3.1.4
136
+ signing_key:
136
137
  specification_version: 4
137
138
  summary: State-of-the-art natural language processing for Ruby
138
139
  test_files: []