carrot2 0.2.1 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -4
- data/LICENSE.txt +2 -2
- data/README.md +56 -48
- data/lib/carrot2.rb +60 -42
- data/lib/carrot2/version.rb +1 -1
- metadata +3 -17
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f7c82346a7d8fedbd0b875e1a755d5a16a1a2a28c680e7f71c1b7bebe9603ef8
|
4
|
+
data.tar.gz: b6493deebcce2d05039ba1c6546e4725f4fbc52d0d35f15583d1d49e6e9d086a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8587ddd05ff3d8c2d491b25c7a10bea228b92a198d18bdd788c03f44c40a89bdffa39cd7ac65b2ec59828c043cbceb4c2a127f2df961c83733440952e9201c23
|
7
|
+
data.tar.gz: f91d543e01915d25586b874970996f50c54a25ac4fd39d5a0a0bfb1a3e1aec84c992b60edea4acfc6073ffef5edef969bbce0174d7f32845bdf6d420a3b66596
|
data/CHANGELOG.md
CHANGED
@@ -1,18 +1,23 @@
|
|
1
|
-
## 0.
|
1
|
+
## 0.3.0 (2020-07-29)
|
2
|
+
|
3
|
+
- Added support for Carrot2 4
|
4
|
+
- Dropped support for Carrot2 < 4
|
5
|
+
|
6
|
+
## 0.2.1 (2019-10-28)
|
2
7
|
|
3
8
|
- Added `open_timeout` and `read_timeout` options
|
4
9
|
|
5
|
-
## 0.2.0
|
10
|
+
## 0.2.0 (2017-01-22)
|
6
11
|
|
7
12
|
- Added `request` method
|
8
13
|
- Removed dependency on `rest-client`
|
9
14
|
- Better error messages
|
10
15
|
|
11
|
-
## 0.1.0
|
16
|
+
## 0.1.0 (2017-01-17)
|
12
17
|
|
13
18
|
- Added support for env var
|
14
19
|
- Added tests
|
15
20
|
|
16
|
-
## 0.0.1
|
21
|
+
## 0.0.1 (2012-06-20)
|
17
22
|
|
18
23
|
- First version
|
data/LICENSE.txt
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
Copyright (c) 2012 Andrew Kane
|
1
|
+
Copyright (c) 2012-2020 Andrew Kane
|
2
2
|
|
3
3
|
MIT License
|
4
4
|
|
@@ -19,4 +19,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
19
19
|
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
20
|
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
21
|
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
-
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1
1
|
# Carrot2
|
2
2
|
|
3
|
-
Ruby client for [Carrot2](https://
|
3
|
+
Ruby client for [Carrot2](https://github.com/carrot2/carrot2) - the open-source document clustering server
|
4
4
|
|
5
|
-
|
5
|
+
[](https://travis-ci.org/ankane/carrot2)
|
6
6
|
|
7
|
-
|
7
|
+
## Installation
|
8
8
|
|
9
|
-
With Homebrew, use:
|
9
|
+
First, [download and run](https://github.com/carrot2/carrot2#installation) the Carrot2 server. With Homebrew, use:
|
10
10
|
|
11
11
|
```sh
|
12
12
|
brew install carrot2
|
@@ -19,6 +19,8 @@ Then add this line to your application’s Gemfile:
|
|
19
19
|
gem 'carrot2'
|
20
20
|
```
|
21
21
|
|
22
|
+
The latest version works with Carrot2 4. For Carrot2 3, use version 0.2.1 and [this readme](https://github.com/ankane/carrot2/blob/v0.2.1/README.md).
|
23
|
+
|
22
24
|
## How to Use
|
23
25
|
|
24
26
|
To cluster documents, use:
|
@@ -39,35 +41,20 @@ This returns:
|
|
39
41
|
|
40
42
|
```ruby
|
41
43
|
{
|
42
|
-
"
|
43
|
-
"clusters"=> [
|
44
|
+
"clusters" => [
|
44
45
|
{
|
45
|
-
"
|
46
|
-
"
|
47
|
-
"
|
48
|
-
"score"=>0.
|
49
|
-
"documents"=>[0, 1, 2],
|
50
|
-
"attributes"=>{"score"=>0.06462323710740674}
|
46
|
+
"labels" => ["Coupon"],
|
47
|
+
"documents" => [0, 1, 2],
|
48
|
+
"clusters" => [],
|
49
|
+
"score" => 0.06418006702675011
|
51
50
|
},
|
52
51
|
{
|
53
|
-
"
|
54
|
-
"
|
55
|
-
"
|
56
|
-
"score"=>0.
|
57
|
-
"documents"=>[0, 1],
|
58
|
-
"attributes"=>{"score"=>0.05873148311034013}
|
59
|
-
},
|
60
|
-
{
|
61
|
-
"id"=>2,
|
62
|
-
"size"=>1,
|
63
|
-
"phrases"=>["Other Topics"],
|
64
|
-
"score"=>0.0,
|
65
|
-
"documents"=>[3],
|
66
|
-
"attributes"=>{"other-topics"=>true, "score"=>0.0}
|
52
|
+
"labels" => ["Exclusive"],
|
53
|
+
"documents" => [0, 1],
|
54
|
+
"clusters" => [],
|
55
|
+
"score" => 0.7040290701763807
|
67
56
|
}
|
68
|
-
]
|
69
|
-
"processing-time-algorithm"=>1,
|
70
|
-
"query"=>nil
|
57
|
+
]
|
71
58
|
}
|
72
59
|
```
|
73
60
|
|
@@ -76,17 +63,39 @@ Documents are numbered in the order provided, starting with 0.
|
|
76
63
|
Specify a language with:
|
77
64
|
|
78
65
|
```ruby
|
79
|
-
carrot2.cluster(documents, language: "
|
66
|
+
carrot2.cluster(documents, language: "French")
|
67
|
+
```
|
68
|
+
|
69
|
+
Specify an [algorithm](https://carrot2.github.io/release/4.0.0/doc/algorithms/) with:
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
carrot2.cluster(documents, algorithm: "Lingo")
|
80
73
|
```
|
81
74
|
|
82
|
-
|
75
|
+
Get a list of supported languages and algorithms with:
|
83
76
|
|
84
|
-
|
77
|
+
```ruby
|
78
|
+
carrot2.list
|
79
|
+
```
|
80
|
+
|
81
|
+
Specify parameters with:
|
85
82
|
|
86
83
|
```ruby
|
87
|
-
|
88
|
-
|
89
|
-
|
84
|
+
parameters = {
|
85
|
+
preprocessing: {
|
86
|
+
phraseDfThreshold: 1,
|
87
|
+
wordDfThreshold: 1
|
88
|
+
}
|
89
|
+
}
|
90
|
+
carrot2.cluster(documents, parameters: parameters)
|
91
|
+
```
|
92
|
+
|
93
|
+
See supported parameters for [Lingo](https://carrot2.github.io/release/4.0.0/doc/lingo-attributes/), [STC](https://carrot2.github.io/release/4.0.0/doc/stc-attributes/), and [Bisecting K-Means](https://carrot2.github.io/release/4.0.0/doc/kmeans-attributes/).
|
94
|
+
|
95
|
+
Specify a [template](https://carrot2.github.io/release/4.0.0/doc/dcs-templates/) with:
|
96
|
+
|
97
|
+
```ruby
|
98
|
+
carrot2.cluster(documents, template: "lingo")
|
90
99
|
```
|
91
100
|
|
92
101
|
## Configuration
|
@@ -97,25 +106,15 @@ To specify the Carrot2 server, set `ENV["CARROT2_URL"]` or use:
|
|
97
106
|
Carrot2.new(url: "http://localhost:8080")
|
98
107
|
```
|
99
108
|
|
100
|
-
Set timeouts
|
109
|
+
Set timeouts
|
101
110
|
|
102
111
|
```ruby
|
103
112
|
Carrot2.new(open_timeout: 3, read_timeout: 5)
|
104
113
|
```
|
105
114
|
|
106
|
-
##
|
115
|
+
## Resources
|
107
116
|
|
108
|
-
Carrot2
|
109
|
-
|
110
|
-
You can find the `.war` file in the `war` directory in the dcs download. Then run:
|
111
|
-
|
112
|
-
```sh
|
113
|
-
heroku plugins:install heroku-cli-deploy
|
114
|
-
heroku create <app_name>
|
115
|
-
heroku war:deploy carrot2-dcs.war --app <app_name>
|
116
|
-
```
|
117
|
-
|
118
|
-
And set `ENV["CARROT2_URL"]` in your application.
|
117
|
+
- [Carrot2 REST API Basics](https://carrot2.github.io/release/4.0.0/doc/rest-api-basics/)
|
119
118
|
|
120
119
|
## History
|
121
120
|
|
@@ -129,3 +128,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
|
|
129
128
|
- Fix bugs and [submit pull requests](https://github.com/ankane/carrot2/pulls)
|
130
129
|
- Write, clarify, or fix documentation
|
131
130
|
- Suggest or add new features
|
131
|
+
|
132
|
+
To get started with development:
|
133
|
+
|
134
|
+
```sh
|
135
|
+
git clone https://github.com/ankane/carrot2.git
|
136
|
+
cd carrot2
|
137
|
+
bundle install
|
138
|
+
bundle exec rake test
|
139
|
+
```
|
data/lib/carrot2.rb
CHANGED
@@ -1,63 +1,81 @@
|
|
1
|
-
|
2
|
-
require "builder"
|
3
|
-
require "net/http"
|
1
|
+
# dependencies
|
4
2
|
require "json"
|
3
|
+
require "net/http"
|
4
|
+
|
5
|
+
# modules
|
6
|
+
require "carrot2/version"
|
5
7
|
|
6
8
|
class Carrot2
|
7
9
|
class Error < StandardError; end
|
8
10
|
|
9
|
-
|
10
|
-
|
11
|
+
HEADERS = {
|
12
|
+
"Content-Type" => "application/json",
|
13
|
+
"Accept" => "application/json"
|
14
|
+
}
|
11
15
|
|
12
|
-
|
13
|
-
|
14
|
-
@uri = URI.parse(
|
16
|
+
def initialize(url: nil, open_timeout: 3, read_timeout: nil)
|
17
|
+
url ||= ENV["CARROT2_URL"] || "http://localhost:8080"
|
18
|
+
@uri = URI.parse(url)
|
19
|
+
@http = Net::HTTP.new(@uri.host, @uri.port)
|
20
|
+
@http.use_ssl = true if @uri.scheme == "https"
|
21
|
+
@http.open_timeout = open_timeout if open_timeout
|
22
|
+
@http.read_timeout = read_timeout if read_timeout
|
23
|
+
end
|
15
24
|
|
16
|
-
|
17
|
-
|
25
|
+
def list
|
26
|
+
get("service/list")
|
18
27
|
end
|
19
28
|
|
20
|
-
def cluster(documents, language:
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
d.title document
|
27
|
-
end
|
28
|
-
end
|
29
|
+
def cluster(documents, language: nil, algorithm: nil, parameters: nil, template: nil)
|
30
|
+
# no defaults if template
|
31
|
+
unless template
|
32
|
+
language ||= "English"
|
33
|
+
algorithm ||= "Lingo"
|
34
|
+
parameters ||= {}
|
29
35
|
end
|
30
36
|
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
+
# data
|
38
|
+
data = {
|
39
|
+
documents: documents.map { |v| v.is_a?(String) ? {field: v} : v }
|
40
|
+
}
|
41
|
+
data[:language] = language if language
|
42
|
+
data[:algorithm] = algorithm if algorithm
|
43
|
+
data[:parameters] = parameters if parameters
|
44
|
+
|
45
|
+
# path
|
46
|
+
path = "service/cluster"
|
47
|
+
path = "#{path}?#{URI.encode_www_form(template: template)}" if template
|
48
|
+
|
49
|
+
post(path, data)
|
37
50
|
end
|
38
51
|
|
39
|
-
|
40
|
-
req = Net::HTTP::Post.new(@uri)
|
41
|
-
req.set_form_data(params.merge("dcs.output.format" => "JSON"))
|
52
|
+
private
|
42
53
|
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
54
|
+
def get(path)
|
55
|
+
handle_response do
|
56
|
+
@http.get("#{@uri.request_uri.chomp("/")}/#{path}", HEADERS)
|
57
|
+
end
|
58
|
+
end
|
48
59
|
|
49
|
-
|
50
|
-
|
60
|
+
def post(path, data)
|
61
|
+
handle_response do
|
62
|
+
@http.post("#{@uri.request_uri.chomp("/")}/#{path}", data.to_json, HEADERS)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
def handle_response
|
67
|
+
begin
|
68
|
+
response = yield
|
69
|
+
rescue Errno::ECONNREFUSED => e
|
70
|
+
raise Carrot2::Error, e.message
|
51
71
|
end
|
52
72
|
|
53
|
-
|
54
|
-
JSON.parse(response.body)
|
55
|
-
|
56
|
-
body = response.body.to_s
|
57
|
-
# try to get reason from title
|
58
|
-
m = body.match(/<title>(.+)<\/title>/)
|
59
|
-
message = m ? m[1] : body
|
73
|
+
unless response.kind_of?(Net::HTTPSuccess)
|
74
|
+
body = JSON.parse(response.body) rescue {}
|
75
|
+
message = body["message"] || "Bad response: #{response.code}"
|
60
76
|
raise Carrot2::Error, message
|
61
77
|
end
|
78
|
+
|
79
|
+
JSON.parse(response.body)
|
62
80
|
end
|
63
81
|
end
|
data/lib/carrot2/version.rb
CHANGED
metadata
CHANGED
@@ -1,29 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: carrot2
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2020-07-30 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
|
-
- !ruby/object:Gem::Dependency
|
14
|
-
name: builder
|
15
|
-
requirement: !ruby/object:Gem::Requirement
|
16
|
-
requirements:
|
17
|
-
- - ">="
|
18
|
-
- !ruby/object:Gem::Version
|
19
|
-
version: '0'
|
20
|
-
type: :runtime
|
21
|
-
prerelease: false
|
22
|
-
version_requirements: !ruby/object:Gem::Requirement
|
23
|
-
requirements:
|
24
|
-
- - ">="
|
25
|
-
- !ruby/object:Gem::Version
|
26
|
-
version: '0'
|
27
13
|
- !ruby/object:Gem::Dependency
|
28
14
|
name: bundler
|
29
15
|
requirement: !ruby/object:Gem::Requirement
|
@@ -96,7 +82,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
96
82
|
- !ruby/object:Gem::Version
|
97
83
|
version: '0'
|
98
84
|
requirements: []
|
99
|
-
rubygems_version: 3.
|
85
|
+
rubygems_version: 3.1.2
|
100
86
|
signing_key:
|
101
87
|
specification_version: 4
|
102
88
|
summary: Ruby client for Carrot2
|