what_you_say 0.4.3-x64-mingw-ucrt → 0.5.0-x64-mingw-ucrt
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +35 -10
- data/lib/what_you_say/3.1/what_you_say.so +0 -0
- data/lib/what_you_say/3.2/what_you_say.so +0 -0
- data/lib/what_you_say/lang.rb +1 -1
- data/lib/what_you_say/version.rb +2 -2
- data/lib/what_you_say.rb +5 -7
- metadata +4 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d61faf65bcfae71d1e9d4aad9db6b26749acb77e4e72e79b3cc869fd91c3ef80
|
4
|
+
data.tar.gz: 212615ddd8e297823c42006cbeab98355efd1419e6d44ece685b1982cd45891f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 53ba66fd520dd0d0153cc6a03b6c254320c4991fcc252864a0555d4c81028b5a0057275a3896a44bcc93b1e667cf395347ae4facbed8422f88379f27bab5d5d7
|
7
|
+
data.tar.gz: aca2c167ecd3fe1dd826b9cc5b8bd756a06af3e546b6d241dc41668ca33b0381951bf52a7212787d5202fbe2bebd7411410683c390b8db7fab4690646cb30790
|
data/README.md
CHANGED
@@ -1,14 +1,12 @@
|
|
1
1
|
# WhatYouSay
|
2
2
|
|
3
|
-
Quick and easy natural language detection wrapping the [
|
3
|
+
Quick and easy natural language detection wrapping the [lingua-rs Rust crate](https://github.com/pemistahl/lingua-rs). Instantly identify the source language of a piece of text.
|
4
4
|
|
5
5
|

|
6
6
|
|
7
|
-
- Supports [
|
7
|
+
- Supports [75+ languages](https://github.com/pemistahl/lingua-rs/tree/main#3-which-languages-are-supported)
|
8
8
|
- Core library is written in Rust; this is a Ruby wrapper to it
|
9
9
|
- Lightweight, fast, and simple
|
10
|
-
- Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
|
11
|
-
- Provides reliability information
|
12
10
|
|
13
11
|
## Installation
|
14
12
|
|
@@ -22,25 +20,52 @@ If bundler is not being used to manage dependencies, install the gem by executin
|
|
22
20
|
|
23
21
|
## Usage
|
24
22
|
|
25
|
-
The method to call is `
|
23
|
+
The method to call is `detect_language`.
|
24
|
+
|
25
|
+
Pass in the text whose language you want to detect:
|
26
26
|
|
27
27
|
```ruby
|
28
28
|
require "what_you_say"
|
29
29
|
|
30
30
|
text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!"
|
31
31
|
|
32
|
-
result = WhatYouSay.
|
32
|
+
result = WhatYouSay.new.detect_language(text)
|
33
33
|
|
34
|
-
assert_equal("epo", result.code)
|
35
|
-
assert_equal("esperanto", result.eng_name)
|
34
|
+
assert_equal("epo", result.lang.code)
|
35
|
+
assert_equal("esperanto", result.lang.eng_name)
|
36
36
|
```
|
37
37
|
|
38
38
|
You also have to opportunity to `inspect` some output:
|
39
39
|
|
40
40
|
```ruby
|
41
41
|
text = "Եվ ահա ես ստանում եմ մի զանգ պատահական տղայից"
|
42
|
-
WhatYouSay.
|
43
|
-
#=> #<WhatYouSay::Lang code="hye" eng_name="
|
42
|
+
WhatYouSay.new.detect_language(text).inspect
|
43
|
+
#=> #<WhatYouSay::Lang code="hye" eng_name="armenian">
|
44
|
+
```
|
45
|
+
|
46
|
+
Not everything in life is perfect, and neither is this lib. Sometimes language detection will be wildly mistaken. You
|
47
|
+
can attempt to correct this by passing in an `allowlist` of supported languages:
|
48
|
+
|
49
|
+
```ruby
|
50
|
+
text = "สวัสดี Rágis hello"
|
51
|
+
result = WhatYouSay.new.detect_language(text)
|
52
|
+
|
53
|
+
assert_equal("spanish", result.eng_name)
|
54
|
+
|
55
|
+
result = WhatYouSay.new(allowlist: ["English", "Thai"]).detect_language(text)
|
56
|
+
|
57
|
+
assert_equal("eng", result.code)
|
58
|
+
```
|
59
|
+
|
60
|
+
If a language truly cannot be detected, the `Unknown` language type is returned:
|
61
|
+
|
62
|
+
```ruby
|
63
|
+
text = "日本語"
|
64
|
+
|
65
|
+
result = WhatYouSay.new(allowlist: ["English", "Thai"]).detect_language(text)
|
66
|
+
|
67
|
+
assert_equal("???", result.code)
|
68
|
+
assert_equal("unknown", result.eng_name)
|
44
69
|
```
|
45
70
|
|
46
71
|
## Development
|
Binary file
|
Binary file
|
data/lib/what_you_say/lang.rb
CHANGED
data/lib/what_you_say/version.rb
CHANGED
data/lib/what_you_say.rb
CHANGED
@@ -9,13 +9,11 @@ if ENV.fetch("DEBUG", false)
|
|
9
9
|
require "debug"
|
10
10
|
end
|
11
11
|
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
raise TypeError, "text must be UTF-8 encoded; got #{text.encoding}!" unless text.encoding.name == "UTF-8"
|
12
|
+
class WhatYouSay
|
13
|
+
def detect_language(text)
|
14
|
+
raise TypeError, "text must be a String; got a #{text.class}!" unless text.is_a?(String)
|
15
|
+
raise TypeError, "text must be UTF-8 encoded; got #{text.encoding}!" unless text.encoding.name == "UTF-8"
|
17
16
|
|
18
|
-
|
19
|
-
end
|
17
|
+
detect_text(text)
|
20
18
|
end
|
21
19
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: what_you_say
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: x64-mingw-ucrt
|
6
6
|
authors:
|
7
7
|
- Garen J. Torikian
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-04-
|
11
|
+
date: 2023-04-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -38,8 +38,8 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '1.2'
|
41
|
-
description: Natural language
|
42
|
-
Currently wraps the
|
41
|
+
description: Natural language detection with a focus on simplicity and performance.
|
42
|
+
Currently wraps the lingua-rs Rust crate.
|
43
43
|
email:
|
44
44
|
- gjtorikian@users.noreply.github.com
|
45
45
|
executables: []
|