carrot2 0.2.1 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -4
- data/LICENSE.txt +2 -2
- data/README.md +56 -48
- data/lib/carrot2.rb +60 -42
- data/lib/carrot2/version.rb +1 -1
- metadata +3 -17
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f7c82346a7d8fedbd0b875e1a755d5a16a1a2a28c680e7f71c1b7bebe9603ef8
|
4
|
+
data.tar.gz: b6493deebcce2d05039ba1c6546e4725f4fbc52d0d35f15583d1d49e6e9d086a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8587ddd05ff3d8c2d491b25c7a10bea228b92a198d18bdd788c03f44c40a89bdffa39cd7ac65b2ec59828c043cbceb4c2a127f2df961c83733440952e9201c23
|
7
|
+
data.tar.gz: f91d543e01915d25586b874970996f50c54a25ac4fd39d5a0a0bfb1a3e1aec84c992b60edea4acfc6073ffef5edef969bbce0174d7f32845bdf6d420a3b66596
|
data/CHANGELOG.md
CHANGED
@@ -1,18 +1,23 @@
|
|
1
|
-
## 0.
|
1
|
+
## 0.3.0 (2020-07-29)
|
2
|
+
|
3
|
+
- Added support for Carrot2 4
|
4
|
+
- Dropped support for Carrot2 < 4
|
5
|
+
|
6
|
+
## 0.2.1 (2019-10-28)
|
2
7
|
|
3
8
|
- Added `open_timeout` and `read_timeout` options
|
4
9
|
|
5
|
-
## 0.2.0
|
10
|
+
## 0.2.0 (2017-01-22)
|
6
11
|
|
7
12
|
- Added `request` method
|
8
13
|
- Removed dependency on `rest-client`
|
9
14
|
- Better error messages
|
10
15
|
|
11
|
-
## 0.1.0
|
16
|
+
## 0.1.0 (2017-01-17)
|
12
17
|
|
13
18
|
- Added support for env var
|
14
19
|
- Added tests
|
15
20
|
|
16
|
-
## 0.0.1
|
21
|
+
## 0.0.1 (2012-06-20)
|
17
22
|
|
18
23
|
- First version
|
data/LICENSE.txt
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
Copyright (c) 2012 Andrew Kane
|
1
|
+
Copyright (c) 2012-2020 Andrew Kane
|
2
2
|
|
3
3
|
MIT License
|
4
4
|
|
@@ -19,4 +19,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
19
19
|
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
20
|
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
21
|
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
-
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1
1
|
# Carrot2
|
2
2
|
|
3
|
-
Ruby client for [Carrot2](https://
|
3
|
+
Ruby client for [Carrot2](https://github.com/carrot2/carrot2) - the open-source document clustering server
|
4
4
|
|
5
|
-
|
5
|
+
[![Build Status](https://travis-ci.org/ankane/carrot2.svg?branch=master)](https://travis-ci.org/ankane/carrot2)
|
6
6
|
|
7
|
-
|
7
|
+
## Installation
|
8
8
|
|
9
|
-
With Homebrew, use:
|
9
|
+
First, [download and run](https://github.com/carrot2/carrot2#installation) the Carrot2 server. With Homebrew, use:
|
10
10
|
|
11
11
|
```sh
|
12
12
|
brew install carrot2
|
@@ -19,6 +19,8 @@ Then add this line to your application’s Gemfile:
|
|
19
19
|
gem 'carrot2'
|
20
20
|
```
|
21
21
|
|
22
|
+
The latest version works with Carrot2 4. For Carrot2 3, use version 0.2.1 and [this readme](https://github.com/ankane/carrot2/blob/v0.2.1/README.md).
|
23
|
+
|
22
24
|
## How to Use
|
23
25
|
|
24
26
|
To cluster documents, use:
|
@@ -39,35 +41,20 @@ This returns:
|
|
39
41
|
|
40
42
|
```ruby
|
41
43
|
{
|
42
|
-
"
|
43
|
-
"clusters"=> [
|
44
|
+
"clusters" => [
|
44
45
|
{
|
45
|
-
"
|
46
|
-
"
|
47
|
-
"
|
48
|
-
"score"=>0.
|
49
|
-
"documents"=>[0, 1, 2],
|
50
|
-
"attributes"=>{"score"=>0.06462323710740674}
|
46
|
+
"labels" => ["Coupon"],
|
47
|
+
"documents" => [0, 1, 2],
|
48
|
+
"clusters" => [],
|
49
|
+
"score" => 0.06418006702675011
|
51
50
|
},
|
52
51
|
{
|
53
|
-
"
|
54
|
-
"
|
55
|
-
"
|
56
|
-
"score"=>0.
|
57
|
-
"documents"=>[0, 1],
|
58
|
-
"attributes"=>{"score"=>0.05873148311034013}
|
59
|
-
},
|
60
|
-
{
|
61
|
-
"id"=>2,
|
62
|
-
"size"=>1,
|
63
|
-
"phrases"=>["Other Topics"],
|
64
|
-
"score"=>0.0,
|
65
|
-
"documents"=>[3],
|
66
|
-
"attributes"=>{"other-topics"=>true, "score"=>0.0}
|
52
|
+
"labels" => ["Exclusive"],
|
53
|
+
"documents" => [0, 1],
|
54
|
+
"clusters" => [],
|
55
|
+
"score" => 0.7040290701763807
|
67
56
|
}
|
68
|
-
]
|
69
|
-
"processing-time-algorithm"=>1,
|
70
|
-
"query"=>nil
|
57
|
+
]
|
71
58
|
}
|
72
59
|
```
|
73
60
|
|
@@ -76,17 +63,39 @@ Documents are numbered in the order provided, starting with 0.
|
|
76
63
|
Specify a language with:
|
77
64
|
|
78
65
|
```ruby
|
79
|
-
carrot2.cluster(documents, language: "
|
66
|
+
carrot2.cluster(documents, language: "French")
|
67
|
+
```
|
68
|
+
|
69
|
+
Specify an [algorithm](https://carrot2.github.io/release/4.0.0/doc/algorithms/) with:
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
carrot2.cluster(documents, algorithm: "Lingo")
|
80
73
|
```
|
81
74
|
|
82
|
-
|
75
|
+
Get a list of supported languages and algorithms with:
|
83
76
|
|
84
|
-
|
77
|
+
```ruby
|
78
|
+
carrot2.list
|
79
|
+
```
|
80
|
+
|
81
|
+
Specify parameters with:
|
85
82
|
|
86
83
|
```ruby
|
87
|
-
|
88
|
-
|
89
|
-
|
84
|
+
parameters = {
|
85
|
+
preprocessing: {
|
86
|
+
phraseDfThreshold: 1,
|
87
|
+
wordDfThreshold: 1
|
88
|
+
}
|
89
|
+
}
|
90
|
+
carrot2.cluster(documents, parameters: parameters)
|
91
|
+
```
|
92
|
+
|
93
|
+
See supported parameters for [Lingo](https://carrot2.github.io/release/4.0.0/doc/lingo-attributes/), [STC](https://carrot2.github.io/release/4.0.0/doc/stc-attributes/), and [Bisecting K-Means](https://carrot2.github.io/release/4.0.0/doc/kmeans-attributes/).
|
94
|
+
|
95
|
+
Specify a [template](https://carrot2.github.io/release/4.0.0/doc/dcs-templates/) with:
|
96
|
+
|
97
|
+
```ruby
|
98
|
+
carrot2.cluster(documents, template: "lingo")
|
90
99
|
```
|
91
100
|
|
92
101
|
## Configuration
|
@@ -97,25 +106,15 @@ To specify the Carrot2 server, set `ENV["CARROT2_URL"]` or use:
|
|
97
106
|
Carrot2.new(url: "http://localhost:8080")
|
98
107
|
```
|
99
108
|
|
100
|
-
Set timeouts
|
109
|
+
Set timeouts
|
101
110
|
|
102
111
|
```ruby
|
103
112
|
Carrot2.new(open_timeout: 3, read_timeout: 5)
|
104
113
|
```
|
105
114
|
|
106
|
-
##
|
115
|
+
## Resources
|
107
116
|
|
108
|
-
Carrot2
|
109
|
-
|
110
|
-
You can find the `.war` file in the `war` directory in the dcs download. Then run:
|
111
|
-
|
112
|
-
```sh
|
113
|
-
heroku plugins:install heroku-cli-deploy
|
114
|
-
heroku create <app_name>
|
115
|
-
heroku war:deploy carrot2-dcs.war --app <app_name>
|
116
|
-
```
|
117
|
-
|
118
|
-
And set `ENV["CARROT2_URL"]` in your application.
|
117
|
+
- [Carrot2 REST API Basics](https://carrot2.github.io/release/4.0.0/doc/rest-api-basics/)
|
119
118
|
|
120
119
|
## History
|
121
120
|
|
@@ -129,3 +128,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
|
|
129
128
|
- Fix bugs and [submit pull requests](https://github.com/ankane/carrot2/pulls)
|
130
129
|
- Write, clarify, or fix documentation
|
131
130
|
- Suggest or add new features
|
131
|
+
|
132
|
+
To get started with development:
|
133
|
+
|
134
|
+
```sh
|
135
|
+
git clone https://github.com/ankane/carrot2.git
|
136
|
+
cd carrot2
|
137
|
+
bundle install
|
138
|
+
bundle exec rake test
|
139
|
+
```
|
data/lib/carrot2.rb
CHANGED
@@ -1,63 +1,81 @@
|
|
1
|
-
|
2
|
-
require "builder"
|
3
|
-
require "net/http"
|
1
|
+
# dependencies
|
4
2
|
require "json"
|
3
|
+
require "net/http"
|
4
|
+
|
5
|
+
# modules
|
6
|
+
require "carrot2/version"
|
5
7
|
|
6
8
|
class Carrot2
|
7
9
|
class Error < StandardError; end
|
8
10
|
|
9
|
-
|
10
|
-
|
11
|
+
HEADERS = {
|
12
|
+
"Content-Type" => "application/json",
|
13
|
+
"Accept" => "application/json"
|
14
|
+
}
|
11
15
|
|
12
|
-
|
13
|
-
|
14
|
-
@uri = URI.parse(
|
16
|
+
def initialize(url: nil, open_timeout: 3, read_timeout: nil)
|
17
|
+
url ||= ENV["CARROT2_URL"] || "http://localhost:8080"
|
18
|
+
@uri = URI.parse(url)
|
19
|
+
@http = Net::HTTP.new(@uri.host, @uri.port)
|
20
|
+
@http.use_ssl = true if @uri.scheme == "https"
|
21
|
+
@http.open_timeout = open_timeout if open_timeout
|
22
|
+
@http.read_timeout = read_timeout if read_timeout
|
23
|
+
end
|
15
24
|
|
16
|
-
|
17
|
-
|
25
|
+
def list
|
26
|
+
get("service/list")
|
18
27
|
end
|
19
28
|
|
20
|
-
def cluster(documents, language:
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
d.title document
|
27
|
-
end
|
28
|
-
end
|
29
|
+
def cluster(documents, language: nil, algorithm: nil, parameters: nil, template: nil)
|
30
|
+
# no defaults if template
|
31
|
+
unless template
|
32
|
+
language ||= "English"
|
33
|
+
algorithm ||= "Lingo"
|
34
|
+
parameters ||= {}
|
29
35
|
end
|
30
36
|
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
+
# data
|
38
|
+
data = {
|
39
|
+
documents: documents.map { |v| v.is_a?(String) ? {field: v} : v }
|
40
|
+
}
|
41
|
+
data[:language] = language if language
|
42
|
+
data[:algorithm] = algorithm if algorithm
|
43
|
+
data[:parameters] = parameters if parameters
|
44
|
+
|
45
|
+
# path
|
46
|
+
path = "service/cluster"
|
47
|
+
path = "#{path}?#{URI.encode_www_form(template: template)}" if template
|
48
|
+
|
49
|
+
post(path, data)
|
37
50
|
end
|
38
51
|
|
39
|
-
|
40
|
-
req = Net::HTTP::Post.new(@uri)
|
41
|
-
req.set_form_data(params.merge("dcs.output.format" => "JSON"))
|
52
|
+
private
|
42
53
|
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
54
|
+
def get(path)
|
55
|
+
handle_response do
|
56
|
+
@http.get("#{@uri.request_uri.chomp("/")}/#{path}", HEADERS)
|
57
|
+
end
|
58
|
+
end
|
48
59
|
|
49
|
-
|
50
|
-
|
60
|
+
def post(path, data)
|
61
|
+
handle_response do
|
62
|
+
@http.post("#{@uri.request_uri.chomp("/")}/#{path}", data.to_json, HEADERS)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
def handle_response
|
67
|
+
begin
|
68
|
+
response = yield
|
69
|
+
rescue Errno::ECONNREFUSED => e
|
70
|
+
raise Carrot2::Error, e.message
|
51
71
|
end
|
52
72
|
|
53
|
-
|
54
|
-
JSON.parse(response.body)
|
55
|
-
|
56
|
-
body = response.body.to_s
|
57
|
-
# try to get reason from title
|
58
|
-
m = body.match(/<title>(.+)<\/title>/)
|
59
|
-
message = m ? m[1] : body
|
73
|
+
unless response.kind_of?(Net::HTTPSuccess)
|
74
|
+
body = JSON.parse(response.body) rescue {}
|
75
|
+
message = body["message"] || "Bad response: #{response.code}"
|
60
76
|
raise Carrot2::Error, message
|
61
77
|
end
|
78
|
+
|
79
|
+
JSON.parse(response.body)
|
62
80
|
end
|
63
81
|
end
|
data/lib/carrot2/version.rb
CHANGED
metadata
CHANGED
@@ -1,29 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: carrot2
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2020-07-30 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
|
-
- !ruby/object:Gem::Dependency
|
14
|
-
name: builder
|
15
|
-
requirement: !ruby/object:Gem::Requirement
|
16
|
-
requirements:
|
17
|
-
- - ">="
|
18
|
-
- !ruby/object:Gem::Version
|
19
|
-
version: '0'
|
20
|
-
type: :runtime
|
21
|
-
prerelease: false
|
22
|
-
version_requirements: !ruby/object:Gem::Requirement
|
23
|
-
requirements:
|
24
|
-
- - ">="
|
25
|
-
- !ruby/object:Gem::Version
|
26
|
-
version: '0'
|
27
13
|
- !ruby/object:Gem::Dependency
|
28
14
|
name: bundler
|
29
15
|
requirement: !ruby/object:Gem::Requirement
|
@@ -96,7 +82,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
96
82
|
- !ruby/object:Gem::Version
|
97
83
|
version: '0'
|
98
84
|
requirements: []
|
99
|
-
rubygems_version: 3.
|
85
|
+
rubygems_version: 3.1.2
|
100
86
|
signing_key:
|
101
87
|
specification_version: 4
|
102
88
|
summary: Ruby client for Carrot2
|