carrot2 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 73489263c06ae2b8e9f7509570c35c052635340d8401f04d659fe4080188b049
4
- data.tar.gz: 891c4d26486b4c39c1cdff7756bb97faea0ff01ae5aceaa8082ed122d4300f6c
3
+ metadata.gz: f7c82346a7d8fedbd0b875e1a755d5a16a1a2a28c680e7f71c1b7bebe9603ef8
4
+ data.tar.gz: b6493deebcce2d05039ba1c6546e4725f4fbc52d0d35f15583d1d49e6e9d086a
5
5
  SHA512:
6
- metadata.gz: 30e723899e33b4ce32463b52a65b24ac1f1a1fc7a39fec038dfd95c5fad7cdf0bb78555e24fdeff18657905ca194fcc59afded05dc292cfcb06ca7aa919eca59
7
- data.tar.gz: 1dae126a035baf128cc7e4ec265259056e1817694b2a61317edc339f4b193a1f0b81c4f08bc80a8d98fda28dd69f4ef12be75f30d2450be55654f33ee8085927
6
+ metadata.gz: 8587ddd05ff3d8c2d491b25c7a10bea228b92a198d18bdd788c03f44c40a89bdffa39cd7ac65b2ec59828c043cbceb4c2a127f2df961c83733440952e9201c23
7
+ data.tar.gz: f91d543e01915d25586b874970996f50c54a25ac4fd39d5a0a0bfb1a3e1aec84c992b60edea4acfc6073ffef5edef969bbce0174d7f32845bdf6d420a3b66596
@@ -1,18 +1,23 @@
1
- ## 0.2.1
1
+ ## 0.3.0 (2020-07-29)
2
+
3
+ - Added support for Carrot2 4
4
+ - Dropped support for Carrot2 < 4
5
+
6
+ ## 0.2.1 (2019-10-28)
2
7
 
3
8
  - Added `open_timeout` and `read_timeout` options
4
9
 
5
- ## 0.2.0
10
+ ## 0.2.0 (2017-01-22)
6
11
 
7
12
  - Added `request` method
8
13
  - Removed dependency on `rest-client`
9
14
  - Better error messages
10
15
 
11
- ## 0.1.0
16
+ ## 0.1.0 (2017-01-17)
12
17
 
13
18
  - Added support for env var
14
19
  - Added tests
15
20
 
16
- ## 0.0.1
21
+ ## 0.0.1 (2012-06-20)
17
22
 
18
23
  - First version
@@ -1,4 +1,4 @@
1
- Copyright (c) 2012 Andrew Kane
1
+ Copyright (c) 2012-2020 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
@@ -19,4 +19,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
19
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
20
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
21
  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
1
  # Carrot2
2
2
 
3
- Ruby client for [Carrot2](https://project.carrot2.org/) - the open-source document clustering server
3
+ Ruby client for [Carrot2](https://github.com/carrot2/carrot2) - the open-source document clustering server
4
4
 
5
- ## Installation
5
+ [![Build Status](https://travis-ci.org/ankane/carrot2.svg?branch=master)](https://travis-ci.org/ankane/carrot2)
6
6
 
7
- First, [download and run](https://project.carrot2.org/download-dcs.html) the Carrot2 server. It’s the one on [this page](https://github.com/carrot2/carrot2/releases) that begins with `carrot2-dcs`.
7
+ ## Installation
8
8
 
9
- With Homebrew, use:
9
+ First, [download and run](https://github.com/carrot2/carrot2#installation) the Carrot2 server. With Homebrew, use:
10
10
 
11
11
  ```sh
12
12
  brew install carrot2
@@ -19,6 +19,8 @@ Then add this line to your application’s Gemfile:
19
19
  gem 'carrot2'
20
20
  ```
21
21
 
22
+ The latest version works with Carrot2 4. For Carrot2 3, use version 0.2.1 and [this readme](https://github.com/ankane/carrot2/blob/v0.2.1/README.md).
23
+
22
24
  ## How to Use
23
25
 
24
26
  To cluster documents, use:
@@ -39,35 +41,20 @@ This returns:
39
41
 
40
42
  ```ruby
41
43
  {
42
- "processing-time-total"=>1,
43
- "clusters"=> [
44
+ "clusters" => [
44
45
  {
45
- "id"=>0,
46
- "size"=>3,
47
- "phrases"=>["Coupon"],
48
- "score"=>0.06462323710740674,
49
- "documents"=>[0, 1, 2],
50
- "attributes"=>{"score"=>0.06462323710740674}
46
+ "labels" => ["Coupon"],
47
+ "documents" => [0, 1, 2],
48
+ "clusters" => [],
49
+ "score" => 0.06418006702675011
51
50
  },
52
51
  {
53
- "id"=>1,
54
- "size"=>2,
55
- "phrases"=>["Exclusive"],
56
- "score"=>0.05873148311034013,
57
- "documents"=>[0, 1],
58
- "attributes"=>{"score"=>0.05873148311034013}
59
- },
60
- {
61
- "id"=>2,
62
- "size"=>1,
63
- "phrases"=>["Other Topics"],
64
- "score"=>0.0,
65
- "documents"=>[3],
66
- "attributes"=>{"other-topics"=>true, "score"=>0.0}
52
+ "labels" => ["Exclusive"],
53
+ "documents" => [0, 1],
54
+ "clusters" => [],
55
+ "score" => 0.7040290701763807
67
56
  }
68
- ],
69
- "processing-time-algorithm"=>1,
70
- "query"=>nil
57
+ ]
71
58
  }
72
59
  ```
73
60
 
@@ -76,17 +63,39 @@ Documents are numbered in the order provided, starting with 0.
76
63
  Specify a language with:
77
64
 
78
65
  ```ruby
79
- carrot2.cluster(documents, language: "FRENCH")
66
+ carrot2.cluster(documents, language: "French")
67
+ ```
68
+
69
+ Specify an [algorithm](https://carrot2.github.io/release/4.0.0/doc/algorithms/) with:
70
+
71
+ ```ruby
72
+ carrot2.cluster(documents, algorithm: "Lingo")
80
73
  ```
81
74
 
82
- [All of these languages are supported](https://doc.carrot2.org/#section.faq.preliminaries.supported-languages)
75
+ Get a list of supported languages and algorithms with:
83
76
 
84
- For other requests, use:
77
+ ```ruby
78
+ carrot2.list
79
+ ```
80
+
81
+ Specify parameters with:
85
82
 
86
83
  ```ruby
87
- carrot2.request(
88
- "dcs.c2stream" => xml_str
89
- )
84
+ parameters = {
85
+ preprocessing: {
86
+ phraseDfThreshold: 1,
87
+ wordDfThreshold: 1
88
+ }
89
+ }
90
+ carrot2.cluster(documents, parameters: parameters)
91
+ ```
92
+
93
+ See supported parameters for [Lingo](https://carrot2.github.io/release/4.0.0/doc/lingo-attributes/), [STC](https://carrot2.github.io/release/4.0.0/doc/stc-attributes/), and [Bisecting K-Means](https://carrot2.github.io/release/4.0.0/doc/kmeans-attributes/).
94
+
95
+ Specify a [template](https://carrot2.github.io/release/4.0.0/doc/dcs-templates/) with:
96
+
97
+ ```ruby
98
+ carrot2.cluster(documents, template: "lingo")
90
99
  ```
91
100
 
92
101
  ## Configuration
@@ -97,25 +106,15 @@ To specify the Carrot2 server, set `ENV["CARROT2_URL"]` or use:
97
106
  Carrot2.new(url: "http://localhost:8080")
98
107
  ```
99
108
 
100
- Set timeouts [master]
109
+ Set timeouts
101
110
 
102
111
  ```ruby
103
112
  Carrot2.new(open_timeout: 3, read_timeout: 5)
104
113
  ```
105
114
 
106
- ## Heroku
115
+ ## Resources
107
116
 
108
- Carrot2 can be easily deployed to Heroku thanks to support for [WAR deployment](https://devcenter.heroku.com/articles/war-deployment).
109
-
110
- You can find the `.war` file in the `war` directory in the dcs download. Then run:
111
-
112
- ```sh
113
- heroku plugins:install heroku-cli-deploy
114
- heroku create <app_name>
115
- heroku war:deploy carrot2-dcs.war --app <app_name>
116
- ```
117
-
118
- And set `ENV["CARROT2_URL"]` in your application.
117
+ - [Carrot2 REST API Basics](https://carrot2.github.io/release/4.0.0/doc/rest-api-basics/)
119
118
 
120
119
  ## History
121
120
 
@@ -129,3 +128,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
129
128
  - Fix bugs and [submit pull requests](https://github.com/ankane/carrot2/pulls)
130
129
  - Write, clarify, or fix documentation
131
130
  - Suggest or add new features
131
+
132
+ To get started with development:
133
+
134
+ ```sh
135
+ git clone https://github.com/ankane/carrot2.git
136
+ cd carrot2
137
+ bundle install
138
+ bundle exec rake test
139
+ ```
@@ -1,63 +1,81 @@
1
- require "carrot2/version"
2
- require "builder"
3
- require "net/http"
1
+ # dependencies
4
2
  require "json"
3
+ require "net/http"
4
+
5
+ # modules
6
+ require "carrot2/version"
5
7
 
6
8
  class Carrot2
7
9
  class Error < StandardError; end
8
10
 
9
- def initialize(url: nil, open_timeout: 3, read_timeout: nil)
10
- @url = url || ENV["CARROT2_URL"] || "http://localhost:8080"
11
+ HEADERS = {
12
+ "Content-Type" => "application/json",
13
+ "Accept" => "application/json"
14
+ }
11
15
 
12
- # add dcs/rest
13
- @url = "#{@url.sub(/\/\z/, "")}/dcs/rest"
14
- @uri = URI.parse(@url)
16
+ def initialize(url: nil, open_timeout: 3, read_timeout: nil)
17
+ url ||= ENV["CARROT2_URL"] || "http://localhost:8080"
18
+ @uri = URI.parse(url)
19
+ @http = Net::HTTP.new(@uri.host, @uri.port)
20
+ @http.use_ssl = true if @uri.scheme == "https"
21
+ @http.open_timeout = open_timeout if open_timeout
22
+ @http.read_timeout = read_timeout if read_timeout
23
+ end
15
24
 
16
- @open_timeout = open_timeout
17
- @read_timeout = read_timeout
25
+ def list
26
+ get("service/list")
18
27
  end
19
28
 
20
- def cluster(documents, language: "English")
21
- xml = Builder::XmlMarkup.new
22
- xml.instruct! :xml, version: "1.0", encoding: "UTF-8"
23
- xml.searchresult do |s|
24
- documents.each do |document|
25
- s.document do |d|
26
- d.title document
27
- end
28
- end
29
+ def cluster(documents, language: nil, algorithm: nil, parameters: nil, template: nil)
30
+ # no defaults if template
31
+ unless template
32
+ language ||= "English"
33
+ algorithm ||= "Lingo"
34
+ parameters ||= {}
29
35
  end
30
36
 
31
- request(
32
- "dcs.clusters.only" => true,
33
- "dcs.c2stream" => xml.target!,
34
- "MultilingualClustering.defaultLanguage" => language.upcase,
35
- multipart: true
36
- )
37
+ # data
38
+ data = {
39
+ documents: documents.map { |v| v.is_a?(String) ? {field: v} : v }
40
+ }
41
+ data[:language] = language if language
42
+ data[:algorithm] = algorithm if algorithm
43
+ data[:parameters] = parameters if parameters
44
+
45
+ # path
46
+ path = "service/cluster"
47
+ path = "#{path}?#{URI.encode_www_form(template: template)}" if template
48
+
49
+ post(path, data)
37
50
  end
38
51
 
39
- def request(params)
40
- req = Net::HTTP::Post.new(@uri)
41
- req.set_form_data(params.merge("dcs.output.format" => "JSON"))
52
+ private
42
53
 
43
- options = {
44
- use_ssl: @uri.scheme == "https"
45
- }
46
- options[:open_timeout] = @open_timeout if @open_timeout
47
- options[:read_timeout] = @read_timeout if @read_timeout
54
+ def get(path)
55
+ handle_response do
56
+ @http.get("#{@uri.request_uri.chomp("/")}/#{path}", HEADERS)
57
+ end
58
+ end
48
59
 
49
- response = Net::HTTP.start(@uri.hostname, @uri.port, options) do |http|
50
- http.request(req)
60
+ def post(path, data)
61
+ handle_response do
62
+ @http.post("#{@uri.request_uri.chomp("/")}/#{path}", data.to_json, HEADERS)
63
+ end
64
+ end
65
+
66
+ def handle_response
67
+ begin
68
+ response = yield
69
+ rescue Errno::ECONNREFUSED => e
70
+ raise Carrot2::Error, e.message
51
71
  end
52
72
 
53
- if response.code == "200"
54
- JSON.parse(response.body)
55
- else
56
- body = response.body.to_s
57
- # try to get reason from title
58
- m = body.match(/<title>(.+)<\/title>/)
59
- message = m ? m[1] : body
73
+ unless response.kind_of?(Net::HTTPSuccess)
74
+ body = JSON.parse(response.body) rescue {}
75
+ message = body["message"] || "Bad response: #{response.code}"
60
76
  raise Carrot2::Error, message
61
77
  end
78
+
79
+ JSON.parse(response.body)
62
80
  end
63
81
  end
@@ -1,3 +1,3 @@
1
1
  class Carrot2
2
- VERSION = "0.2.1"
2
+ VERSION = "0.3.0"
3
3
  end
metadata CHANGED
@@ -1,29 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: carrot2
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-10-28 00:00:00.000000000 Z
11
+ date: 2020-07-30 00:00:00.000000000 Z
12
12
  dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: builder
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - ">="
18
- - !ruby/object:Gem::Version
19
- version: '0'
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - ">="
25
- - !ruby/object:Gem::Version
26
- version: '0'
27
13
  - !ruby/object:Gem::Dependency
28
14
  name: bundler
29
15
  requirement: !ruby/object:Gem::Requirement
@@ -96,7 +82,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
96
82
  - !ruby/object:Gem::Version
97
83
  version: '0'
98
84
  requirements: []
99
- rubygems_version: 3.0.3
85
+ rubygems_version: 3.1.2
100
86
  signing_key:
101
87
  specification_version: 4
102
88
  summary: Ruby client for Carrot2