blingfire 0.1.6 → 0.1.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: fa9f7ebf09e5745b6865d4c5645b36f42b7c484ed667ce493febee80887e89ad
4
- data.tar.gz: 7fb6e8716138c35396964081b8ad60d7c01db88e71e4fbe219d0ea27a6eff94b
3
+ metadata.gz: 466b528fca6404415ad072b597cbb6fc96628a325504aba0e7becc4793b35b3f
4
+ data.tar.gz: c1e02e1f8c48578c6407ac9f7d7a996ecd78205ffdcaa3158144012f61d8922e
5
5
  SHA512:
6
- metadata.gz: 4f1e20f3520f4be17af2df5db4b3e05b169756f1667dccb08dbafb89ee07f633862d16649fa8e3be505073ca58a11989cc5c9b99d3f571aae2eaff3d4053dd8c
7
- data.tar.gz: 8f70ecb06ddcad466508a0debe63fd99e9a5cbab35e3702296c8ff522ffbb5075ad52b2ce68d2f316de6685256b756ff098bc5a2b29e807652b89de84f9b4661
6
+ metadata.gz: cbfd60e8e54b42a1e7258afcebb51d27029bf66a2969daf4badc62f3117d7113b8d02f99c3445513dc37f08698205c21ba97787ba761512166257438b73d3956
7
+ data.tar.gz: 1aa595a55116d5e084d31e087180ed04392d1b7d91932a875d53cab88bc72b3a1ae179c15094b9679aa9fa78dc4419afc70f144ba971343d0c465a7842ca634e
data/CHANGELOG.md CHANGED
@@ -1,3 +1,12 @@
1
+ ## 0.1.8 (2023-02-01)
2
+
3
+ - Improved ARM detection
4
+
5
+ ## 0.1.7 (2021-09-24)
6
+
7
+ - Updated Bling Fire to 0.1.8
8
+ - Added `ids_to_text` method
9
+
1
10
  ## 0.1.6 (2021-06-07)
2
11
 
3
12
  - Updated Bling Fire to 0.1.7
data/README.md CHANGED
@@ -1,15 +1,15 @@
1
- # Bling Fire
1
+ # Bling Fire Ruby
2
2
 
3
3
  [Bling Fire](https://github.com/microsoft/BlingFire) - high speed text tokenization - for Ruby
4
4
 
5
- [![Build Status](https://github.com/ankane/blingfire/workflows/build/badge.svg?branch=master)](https://github.com/ankane/blingfire/actions)
5
+ [![Build Status](https://github.com/ankane/blingfire-ruby/workflows/build/badge.svg?branch=master)](https://github.com/ankane/blingfire-ruby/actions)
6
6
 
7
7
  ## Installation
8
8
 
9
9
  Add this line to your application’s Gemfile:
10
10
 
11
11
  ```ruby
12
- gem 'blingfire'
12
+ gem "blingfire"
13
13
  ```
14
14
 
15
15
  ## Getting Started
@@ -82,24 +82,38 @@ Disable prefix space
82
82
  model = BlingFire.load_model("roberta.bin", prefix: false)
83
83
  ```
84
84
 
85
+ ## Ids to Text [experimental]
86
+
87
+ Load a model
88
+
89
+ ```ruby
90
+ model = BlingFire.load_model("bert_base_tok.i2w")
91
+ ```
92
+
93
+ Convert ids to text
94
+
95
+ ```ruby
96
+ model.ids_to_text(ids)
97
+ ```
98
+
85
99
  ## History
86
100
 
87
- View the [changelog](https://github.com/ankane/blingfire/blob/master/CHANGELOG.md)
101
+ View the [changelog](https://github.com/ankane/blingfire-ruby/blob/master/CHANGELOG.md)
88
102
 
89
103
  ## Contributing
90
104
 
91
105
  Everyone is encouraged to help improve this project. Here are a few ways you can help:
92
106
 
93
- - [Report bugs](https://github.com/ankane/blingfire/issues)
94
- - Fix bugs and [submit pull requests](https://github.com/ankane/blingfire/pulls)
107
+ - [Report bugs](https://github.com/ankane/blingfire-ruby/issues)
108
+ - Fix bugs and [submit pull requests](https://github.com/ankane/blingfire-ruby/pulls)
95
109
  - Write, clarify, or fix documentation
96
110
  - Suggest or add new features
97
111
 
98
112
  To get started with development:
99
113
 
100
114
  ```sh
101
- git clone https://github.com/ankane/blingfire.git
102
- cd blingfire
115
+ git clone https://github.com/ankane/blingfire-ruby.git
116
+ cd blingfire-ruby
103
117
  bundle install
104
118
  bundle exec rake vendor:all download:models
105
119
  bundle exec rake test
data/lib/blingfire/ffi.rb CHANGED
@@ -45,5 +45,8 @@ module BlingFire
45
45
 
46
46
  # prefix
47
47
  extern "int SetNoDummyPrefix(void* ModelPtr, bool fNoDummyPrefix)"
48
+
49
+ # ids to text
50
+ extern "int IdsToText(void* ModelPtr, int32_t * pIdsArr, int IdsCount, char * pOutUtf8Str, int MaxOutUtf8StrByteCount, bool SkipSpecialTokens)"
48
51
  end
49
52
  end
@@ -61,6 +61,14 @@ module BlingFire
61
61
  end
62
62
  end
63
63
 
64
+ def ids_to_text(ids, skip_special_tokens: true, output_buffer_size: nil)
65
+ if @handle
66
+ BlingFire.ids_to_text(@handle, ids, skip_special_tokens: skip_special_tokens, output_buffer_size: output_buffer_size)
67
+ else
68
+ raise "Not implemented"
69
+ end
70
+ end
71
+
64
72
  def to_ptr
65
73
  @handle
66
74
  end
@@ -1,3 +1,3 @@
1
1
  module BlingFire
2
- VERSION = "0.1.6"
2
+ VERSION = "0.1.8"
3
3
  end
data/lib/blingfire.rb CHANGED
@@ -15,13 +15,13 @@ module BlingFire
15
15
  if Gem.win_platform?
16
16
  "blingfiretokdll.dll"
17
17
  elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
18
- if RbConfig::CONFIG["host_cpu"] =~ /arm/i
18
+ if RbConfig::CONFIG["host_cpu"] =~ /arm|aarch64/i
19
19
  "libblingfiretokdll.arm64.dylib"
20
20
  else
21
21
  "libblingfiretokdll.dylib"
22
22
  end
23
23
  else
24
- if RbConfig::CONFIG["host_cpu"] =~ /aarch64/i
24
+ if RbConfig::CONFIG["host_cpu"] =~ /arm|aarch64/i
25
25
  "libblingfiretokdll.arm64.so"
26
26
  else
27
27
  "libblingfiretokdll.so"
@@ -113,6 +113,15 @@ module BlingFire
113
113
  [result].concat(unpack_offsets(start_offsets, end_offsets, result, text))
114
114
  end
115
115
 
116
+ def ids_to_text(model, ids, skip_special_tokens: true, output_buffer_size: nil)
117
+ output_buffer_size ||= ids.size * 32
118
+ c_ids = Fiddle::Pointer[ids.pack("i*")]
119
+ out = Fiddle::Pointer.malloc(output_buffer_size)
120
+ out_size = FFI.IdsToText(model, c_ids, ids.size, out, output_buffer_size, skip_special_tokens ? 1 : 0)
121
+ check_status out_size, out
122
+ encode_utf8(out.to_str(out_size - 1))
123
+ end
124
+
116
125
  def free_model(model)
117
126
  FFI.FreeModel(model)
118
127
  end
Binary file
Binary file
Binary file
Binary file
Binary file
metadata CHANGED
@@ -1,59 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: blingfire
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.6
4
+ version: 0.1.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-06-07 00:00:00.000000000 Z
12
- dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: bundler
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - ">="
18
- - !ruby/object:Gem::Version
19
- version: '0'
20
- type: :development
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - ">="
25
- - !ruby/object:Gem::Version
26
- version: '0'
27
- - !ruby/object:Gem::Dependency
28
- name: rake
29
- requirement: !ruby/object:Gem::Requirement
30
- requirements:
31
- - - ">="
32
- - !ruby/object:Gem::Version
33
- version: '0'
34
- type: :development
35
- prerelease: false
36
- version_requirements: !ruby/object:Gem::Requirement
37
- requirements:
38
- - - ">="
39
- - !ruby/object:Gem::Version
40
- version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: minitest
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '5'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '5'
11
+ date: 2023-02-02 00:00:00.000000000 Z
12
+ dependencies: []
55
13
  description:
56
- email: andrew@chartkick.com
14
+ email: andrew@ankane.org
57
15
  executables: []
58
16
  extensions: []
59
17
  extra_rdoc_files: []
@@ -71,7 +29,7 @@ files:
71
29
  - vendor/libblingfiretokdll.arm64.so
72
30
  - vendor/libblingfiretokdll.dylib
73
31
  - vendor/libblingfiretokdll.so
74
- homepage: https://github.com/ankane/blingfire
32
+ homepage: https://github.com/ankane/blingfire-ruby
75
33
  licenses:
76
34
  - MIT
77
35
  metadata: {}
@@ -90,7 +48,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
90
48
  - !ruby/object:Gem::Version
91
49
  version: '0'
92
50
  requirements: []
93
- rubygems_version: 3.2.3
51
+ rubygems_version: 3.4.1
94
52
  signing_key:
95
53
  specification_version: 4
96
54
  summary: High speed text tokenization for Ruby