informers 0.1.1 → 0.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +10 -2
- data/lib/informers.rb +1 -0
- data/lib/informers/feature_extraction.rb +59 -0
- data/lib/informers/version.rb +1 -1
- metadata +8 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 30960cffae248b704482b2faaa2d573cb9d9cf491543d8c9593b8c937d997a9f
|
4
|
+
data.tar.gz: c6ce6d049ae38eb6a154fb7cd2fbef2cd709a71663f2ffc6951869687ae779e7
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: eb3382ec97e9ffbf7dbada8440290c2c7a2155574d2b5c3a14357dffcf19ac8b36256a0ae2265923cdfb6efed30cc1d1f3106576edfeaa652853c42d18f80063
|
7
|
+
data.tar.gz: 13bc7da32218b600d49d0289dfcb258b25a2e3cce4793398eb92e8e48c2886cfcf7f944e7f666fa5d4170c0db8eade00599d4d4333974eea5e285b0ecf946b5d
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -11,7 +11,7 @@ Supports:
|
|
11
11
|
- Summarization - *in development*
|
12
12
|
- Translation - *in development*
|
13
13
|
|
14
|
-
[![Build Status](https://
|
14
|
+
[![Build Status](https://github.com/ankane/informers/workflows/build/badge.svg?branch=master)](https://github.com/ankane/informers/actions)
|
15
15
|
|
16
16
|
## Installation
|
17
17
|
|
@@ -106,11 +106,19 @@ This returns
|
|
106
106
|
Task | Description | Contributor | License | Link
|
107
107
|
--- | --- | --- | --- | ---
|
108
108
|
Sentiment analysis | DistilBERT fine-tuned on SST-2 | Hugging Face | Apache-2.0 | [Link](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
|
109
|
-
Question answering | DistilBERT | Hugging Face | Apache-2.0 | [Link](https://huggingface.co/distilbert-base-cased-distilled-squad)
|
109
|
+
Question answering | DistilBERT fine-tuned on SQuAD | Hugging Face | Apache-2.0 | [Link](https://huggingface.co/distilbert-base-cased-distilled-squad)
|
110
110
|
Named-entity recognition | BERT fine-tuned on CoNLL03 | Bayerische Staatsbibliothek | In-progress | [Link](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)
|
111
111
|
|
112
112
|
Models are [quantized](https://medium.com/microsoftazure/faster-and-smaller-quantized-nlp-with-hugging-face-and-onnx-runtime-ec5525473bb7) to make them faster and smaller.
|
113
113
|
|
114
|
+
## Deployment
|
115
|
+
|
116
|
+
Check out [Trove](https://github.com/ankane/trove) for deploying models.
|
117
|
+
|
118
|
+
```sh
|
119
|
+
trove push sentiment-analysis.onnx
|
120
|
+
```
|
121
|
+
|
114
122
|
## Credits
|
115
123
|
|
116
124
|
This project uses many state-of-the-art technologies:
|
data/lib/informers.rb
CHANGED
@@ -0,0 +1,59 @@
|
|
1
|
+
# Copyright 2018 The HuggingFace Inc. team.
|
2
|
+
# Copyright 2020 Andrew Kane.
|
3
|
+
#
|
4
|
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
5
|
+
# you may not use this file except in compliance with the License.
|
6
|
+
# You may obtain a copy of the License at
|
7
|
+
#
|
8
|
+
# http://www.apache.org/licenses/LICENSE-2.0
|
9
|
+
#
|
10
|
+
# Unless required by applicable law or agreed to in writing, software
|
11
|
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
12
|
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
13
|
+
# See the License for the specific language governing permissions and
|
14
|
+
# limitations under the License.
|
15
|
+
|
16
|
+
module Informers
|
17
|
+
class FeatureExtraction
|
18
|
+
def initialize(model_path)
|
19
|
+
tokenizer_path = File.expand_path("../../vendor/bert_base_cased_tok.bin", __dir__)
|
20
|
+
@tokenizer = BlingFire.load_model(tokenizer_path)
|
21
|
+
@model = OnnxRuntime::Model.new(model_path)
|
22
|
+
end
|
23
|
+
|
24
|
+
def predict(texts)
|
25
|
+
singular = !texts.is_a?(Array)
|
26
|
+
texts = [texts] if singular
|
27
|
+
|
28
|
+
# tokenize
|
29
|
+
input_ids =
|
30
|
+
texts.map do |text|
|
31
|
+
tokens = @tokenizer.text_to_ids(text, nil, 100) # unk token
|
32
|
+
tokens.unshift(101) # cls token
|
33
|
+
tokens << 102 # sep token
|
34
|
+
tokens
|
35
|
+
end
|
36
|
+
|
37
|
+
max_tokens = input_ids.map(&:size).max
|
38
|
+
attention_mask = []
|
39
|
+
input_ids.each do |ids|
|
40
|
+
zeros = [0] * (max_tokens - ids.size)
|
41
|
+
|
42
|
+
mask = ([1] * ids.size) + zeros
|
43
|
+
attention_mask << mask
|
44
|
+
|
45
|
+
ids.concat(zeros)
|
46
|
+
end
|
47
|
+
|
48
|
+
# infer
|
49
|
+
input = {
|
50
|
+
input_ids: input_ids,
|
51
|
+
attention_mask: attention_mask
|
52
|
+
}
|
53
|
+
output = @model.predict(input)
|
54
|
+
scores = output["output_0"]
|
55
|
+
|
56
|
+
singular ? scores.first : scores
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
data/lib/informers/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: informers
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-
|
11
|
+
date: 2020-11-24 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: blingfire
|
@@ -94,7 +94,7 @@ dependencies:
|
|
94
94
|
- - ">="
|
95
95
|
- !ruby/object:Gem::Version
|
96
96
|
version: '0'
|
97
|
-
description:
|
97
|
+
description:
|
98
98
|
email: andrew@chartkick.com
|
99
99
|
executables: []
|
100
100
|
extensions: []
|
@@ -104,6 +104,7 @@ files:
|
|
104
104
|
- LICENSE.txt
|
105
105
|
- README.md
|
106
106
|
- lib/informers.rb
|
107
|
+
- lib/informers/feature_extraction.rb
|
107
108
|
- lib/informers/ner.rb
|
108
109
|
- lib/informers/question_answering.rb
|
109
110
|
- lib/informers/sentiment_analysis.rb
|
@@ -116,7 +117,7 @@ homepage: https://github.com/ankane/informers
|
|
116
117
|
licenses:
|
117
118
|
- Apache-2.0
|
118
119
|
metadata: {}
|
119
|
-
post_install_message:
|
120
|
+
post_install_message:
|
120
121
|
rdoc_options: []
|
121
122
|
require_paths:
|
122
123
|
- lib
|
@@ -131,8 +132,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
131
132
|
- !ruby/object:Gem::Version
|
132
133
|
version: '0'
|
133
134
|
requirements: []
|
134
|
-
rubygems_version: 3.1.
|
135
|
-
signing_key:
|
135
|
+
rubygems_version: 3.1.4
|
136
|
+
signing_key:
|
136
137
|
specification_version: 4
|
137
138
|
summary: State-of-the-art natural language processing for Ruby
|
138
139
|
test_files: []
|